Should Instrumentation data such as metrics, be transmitted over HTTPS? - http

Should information such as metrics generated from an application that are devoid of any business information, still be subject to encryption/decryption over HTTPS, when being transmitted within the eco system of an organization, that sits behind firewalls?
The reason I am asking this question is that, since the metrics data does not give away any business information, and is behind a firewall already, beyond everything, since the data is tremendous in size (time-series data in the counts of millions of records per second), does it make sense to reduce the computational complexity involved in using HTTPS, that forces encryption/decryption at every hop of the metrics' journey from source to destination, by redirecting metrics data with an ingress policy applied, that routes the packets via another port such as 8080 to skip encryption/decryption, thus saving us BIG on resource utilization, and of course reduced time complexity?
Or is it a known compromise that can in some way turn into a vulnerability hole, that can lead to breaches in the system?
Context:
The applications being monitored are communicating over HTTPS.
The metrics scraping agents are asked to communicate over HTTP
Ingress policy applied on the application node, recognizes the calls from the known metrics scraping agent and routes the packets via a non HTTPS port such as 8080, in order to skip the certificate validation plus mainly, the decryption of metrics payload in the request coming in.
I am looking for suggestions and inputs, especially from someone who has had this problem to solve in their experience. Anybody else with relevant information is more than welcome to add to it.
Any leads appreciated.
Thank you, in advance.

the metrics data does not give away any business information
I think this is not true. Metrics can record traffic patterns also in a business context (e.g.: what users searched for/bought the most, etc.).
Also, it can accidentally contain sensitive information (it should not but accidents can happen). Additionally, it can help attackers to get more data about:
Your infrastructure (what platforms you use)
Your environment (os, java version, etc.)
Your app topology (who calls who)
Please check the Fallacies of distributed computing:
#4 The network is secure
Being behind a firewall does not mean attackers can't get in, that's one of the reasons why you use HTTPS on the internal network.

Related

Detecting VPNs and Proxies via latency

Consider a user who is using a service (say an app backend) and routing their connection through an intermediary proxy and/or vpn. Specifically let’s assume the user is in Shanghai-China, the proxy is in the Dallas-Texas and the backend is on AWS. In theory, compared to a user who actually lives in Dallas-Texas (on the same network) the Shanghai-China user will have additional latency in sending/receiving events due to the Asia<-> USA trip.
Questions:
Are there known/published methodologies for seeing this additional latency and thereby identifying imposters from far away? The simplest I can think of is grouping by isp providers and then looking for outliers in latency.
Are there additional ways to honeypot such users? I’m not a network export but I think various sorts of media (eg video streaming) get different treatment on these networks so I’m wondering if it is possible to send additional event data to honeypot more precise latency anomalies.
Assumptions:
We can assume that we have plenty of user data, from each network provider. We also have streams of event data that includes client and server side timestamps for sending and receiving data.
I’m strictly interested in identifying users who are very far away from the IP source. I am NOT interested in methodologies that strictly try to classify an IP as a VPN (eg what Maxmind does in the above link).
Since you have stated you are not interested in the VPN or Relay identification aspect of it, but only latency detection for "far away" users, I will offer some ideas:
Run your own HTTP(S) measurement server that all clients must run an HTTP ping against. HTTP round-trips time in milliseconds will be acceptable for broad-strokes classification of a user's "distance" from your server - assuming that the initial TCP handshake has been completed by the client, all intermediaries, and your measurement server ("pre-warmed" connection).
Use an IP geolocation API. These will give you the country code (almost always accurate), and approximate latitude, longitude (broadly accurate) that you may use to calculate the distance from your server. This of course assumes that the public IP of the client is visible to you, and not completely obfuscated by the intermediaries.

localised ip assist + DDOS prevention + google billing

We are very new in Google Cloud and learning.
I have two question marks in my mind.
First is
Can I create localisation IP addresses for virtual instances? like I open web site with German IP range or another web site I want assign under Italian IP range.
Where is the best place to start or is it possible under cloud.
Second is
We had DDOS attack to under cloud and resources made peak while under attack, Will google charge extreme price for that peak time or will be normal billing.
Second question brings to third one,
We using cloudflare for domains, Is there stable way yo prevent DDOS attacks under google cloud?
I appreciate your time and answers.
To your first point, are you after finding the shortest path between your users and wherever you serve your content? If that's the case, you can simply put a load balancer in front of your backend services within Google Cloud, with a global public forwarding IP address, and the service itself will take care of redirecting the traffic to the nearest group of machines available. Here is an example of a HTTP(S) Load Balancer setup.
Or is localization what you are trying to achieve? In that case I'd rely on more standard forms of handling the language of choice like using browser settings (or user account settings if existing) or the Accept-Language header. This is a valuable resource from LocalizeJS.
Lastly if you are determined to having multiple versions of your application deployed for the different languages that you support, you could still have an intermediate service that determines the source of the request using IP-based lookups and redirect the user to the version of your choice. Said so, my feeling is that this is a more traditional behavior that in the world of client applications that are responsive and localized on the spot, the extra hop/redirect could get to annoy some users.
To your second point, there is a number of protections that are already built-in on some services within Google Cloud, in order to help you protect your applications and machines in different ways. On the DDoS front, you can benefit from policies and protections on the CDN side, where you get cache and scaling based preventive measures.
In addition to that, and if you have a load balancer put in front of your content, you can benefit from protections on layers 3, 4 and 7 of the OSI model. That includes typical HTTP, SYN floods, port exhaustion or NTP amplification attacks.
What this means is that in many of these situations, your infrastructure will not even notice many of these potential attacks, as they'll be alleviated before they reach your infrastructure (and therefore you will not be billed for that). Said so, I have heard and experienced situations in which these protections did not act in a timely fashion, or were triggered at all. In these scenarios, there is a possibility for your system to need to handle that extra load. However, and especially in events when the attack was obviously malicious and documented to be supposedly handled by Google Cloud, there is a chance to make a point with Google in order to get some support on the topic.
A bit more on that here.
Hope this is helpful.

How to dynamically assign particular client (browser) to one of many servers?

I am building a service which requires me to dynamically launch and close servers at many locations around the world, (for example using AWS). When a user visits my domain they need to be assigned to a local server with the lowest latency.
By assignment, I mean that for example the client makes an ajax call to example.com/getData, it should go directly to one particular server that is has been assigned to. Different servers will be doing different computation, so it is not sufficient to have some kind of general load balancing.
What general mechanisms/technology would allow me to 1) Assess the latency between a particular client and any server under my control? 2) Assign a particular client to a particular server? I cannot use just the IP addresses for example, since javascript has domain name based restrictions.
Thanks
Note: I do not have enough reputation to link all the technologies in the response, therefore sometimes you will see the links copied in plain text.
1) Assign users to a local server with the lowest latency is not always possible.
Sometimes the geographically closest server to a user is unexpectedly the one with the highest latency.
To find the lowest latency between your (running) servers and the users is not an easy task.
There might be many different hops (routers) between the client and the server, and any of them at any time can have problems, routes update, packet congestions and so on.
The quickest way to assess the latency is a ping, but it can be that the firewalls block this.
So the best way to achieve this is to use the anycast
All the major CDN providers implement this method. Some use the TCP anycast, which seems to be not recommended, and others UDP anycast. It is an open debate.
Anyway in order to implement anycast you need to be able to peer with the ISP routers, and normally this is not possible. Additionally there are good peers and bad peers.
Finally All this requires a deep knowledge of the routing protocols and the TCP/IP stack.
A quick and dirty solution could be to use BIND with the GEO-IP patch.
So you can define specific dns query responses per country.
What I mean is that, for instance, if you have a server in UK and one in US you can configure BIND to respond to users coming from europe to hit the UK server and users coming from US to hit the US server.
2) To assign a particular client to a particular server you can use the technique I described on the point 1 or you can use a proxy and sticky sessions.
HA-Proxy is a good product to achieve this. (the URL: xy.1wt.eu )
3) if you use the point 1, you will not have problems with cross domain ajax calls. In fact it is completely transparant for the client. For instance for the same domain example.com a user coming from US will resolve it to 1.1.1.1 whereas a user coming from Germany will resolve example.com to 2.2.2.2 (ip addresses are fake and used just as an example).
On a side note, a solution to do cross domain ajax call is JSON-P which has though some drawbacks, like the lack of support for POST.
If I were you I would go with the BIND and GEO-IP, because it would solve all three problems in once. (a part for the latency because is not always true that the geographically closest server is the one with the lowest latency.)

What are the benefits of not supporting UDP in cloud network? (thinking of Windows Azure case)

Azure, Rackspace and Amazon do handle UDP, but GAE (the most similar to Azure) does not.
I am wondering what are the expected benefits of this restriction. Does it help fine-tuning the network? Does it ease the load balancing? Does is help to secure the network?
I suspect the reason is that UDP traffic does not have a defined lifetime nor a defined packet to packet relationship. This makes it hard to load balance and hard to manage - when you don't know how long to hold the path open you end up using timers, this is a problem for some NAT implementations too.
There's another angle not really explored here so far. UDP traffic is also a huge source of security problems, specifically DDoS attacks.
By blocking all UDP traffic, Azure can more effectively mitigate these attacks. Nearly all large bandwidth attacks, which are by far the hardest to deal with, are Amplification Attacks of some sort and most often UDP based. Allowing that traffic past the border of the network greatly improves the likelihood of service disruption, regardless of QoS sureties.
A second facet to that same story is that by blocking UDP they prevent people from hosting insecure DNS servers and thus prevent Azure from being the source of these large scale amplification attacks. This is actually a very good thing for the internet overall, as I'd think the connectivity of Azure's data centers are significant. To contrast this I've had servers in AWS send non stop UDP attacks to our datacenter for months on end, and could not successfully get the abuse team to respond to it.
The only thing that comes to my mind is that maybe they wanted to avoid their cloud being accessed through an unreliable transport protocol.
Along with scalability, reliability is one of the key aspects in Azure. For example Sql Azure and Azure Storage data is always replicated in at least three places and roles with at least two instances have a 99.95% uptime in their SLA.
Of course, despite its partial unreliability, UDP has its use cases, some of them enumerated in the comments from the feature voting site, but maybe those use cases are not a target for the Azure platform.

how to dispatch network requests to the (geographically) closest server

I'm a Java coder and not very familiar with how networks work (other than basic UDP/TCP connections)
Say I have servers running on machines in the US, Asia, Latin America and Europe. When a user requests a service, I want their request to go to the server closest to them.
Is it possible for me to have one address: mycompany.com, and somehow get requests routed to the appropriate server? Apparently when someone goes to cnn.com, they receive the pictures, videos, etc. from a server close to them. Frankly, I don't see how that works.
By the way, my servers don't serve web pages, they serve other services such as stock market data....just in case that is relevant.
Since I'm a programmer, I'm interested to know how one would do it in software. Since this is little more than an idle curiosity, pointers to commercial products or services won't be very helpful in understanding this problem :)
One simple approach would be to look at the first byte (Class A) of the IP address coming into the UDP DNS request and then based off that you could deliver the right geo-located IP.
Another approach would be a little more complicated. Instead of using the server that is geographically closest to the user, you could use the server that has the lowest latency for that user.
The lower latency will provide faster transfer speeds while being easier to calculate than geographic location.
For a much more detailed look, check out this article on CDNs (pay attention to the Technology Section):
Content Delivery Network - Wikipedia
These are the kinds of networks that the large sites use to distribute their content over the net (Akamai is a popular example). As you can see, things can get pretty complicated pretty quickly with CDNs having their own proprietary protocols, etc...
Update: I didn't see the disclaimer about commercial solutions at the end of the original post. I'll leave this up for those who may find it of interest.
--
Take a look at http://ultradns.com/. A managed DNS service like that may be just what you need to accomplish what you are looking for.
Amazon.com, Forbes.com, Oracle, all use them...
Quote From http://ultradns.com/solutions/traffic.html:
UltraDNS Traffic Management solution provides a set of tools allowing IT administrators to define load balancing configurations for content servers residing in one or more geographic locations. The Traffic Management Solution manages traffic directed to the servers by dynamically changing the responses to DNS requests. Load balancing is performed based on dynamic metrics obtained from the host servers on a continual monitoring basis. The UltraDNS Traffic Management solution is not a single application, but combines the capabilities of several existing UltraDNS systems to control traffic, manage site failures, and optimize web content systems.
One approach is, as Jeff mentioned, using the IP address: http://en.wikipedia.org/wiki/Geolocation_software
In my experienced, this is precise to the nearest relatively large city (in the US at least). There are several open databases to aid in this (see the wiki link). Then you can generate image tags and download links and such based on this information.
As for locating the nearest server, I'm sure you can think of a few ways to do it. For instance, if the best return you can get is major city, you can lookup that city in a list of Latitude/Longitude and calculate the nearest server based on that.

Resources