Replace url in html files on load balance basing on geolocation? - nginx

I have a HTTP web server providing static html pages.
Within the page, it loads images & css from a fixed domain like:
<img src="http://assets.mysite.com/1.jpg" />
Actually there are several different domains serving the same files. For example.
assets-us.mysite.com
assets-eu.mysite.com
assets-asia.mysite.com
I wanna the load balance to replace the domain "assets.mysite.com" to others according to the visitor's geolocation.
For example, when I access the same url from Europe, the html I get is:
<img src="http://assets-eu.mysite.com/1.jpg" />
When I access the same url from Japan, the html I get is
<img src="http://assets-asia.mysite.com/1.jpg" />
I prefer to NGINX(or G-WAN). Is it possible with only some configuration or script setup for the load balance to achieve this? how is the performance affected by this replacement?

If your goal is to perform as well as possible then you should do geo-ip load-balancing at the DNS request - users are redirected prior to querying the Web server. CDNs work this way.
But if you can't do that and want to manage the load-balancing from the Web server then the best way to scale is to use an AS (the networks used by ISPs) lookup table to find in which regions users are located.
Doing this as opposed to searching IP addresses, will immensely reduce the database size, and therefore speed-up operations. IP databases offer more details but are much larger.
For G-WAN, you would write a connection handler or a content-type handler if you want to implement a different logic for different MIME-types (the latter might also ease development as you won't have to parse the request to find the resource type).
If the database is stored locally (preferably in RAM), G-WAN C/C++/C# scripts, if properly implemented, won't increase the latency in a noticeable manner.

Related

Website - blocking view from none specified country locations

I am looking for as reliable and accurate / quick means possible to add in some htaccess code to block visits to a website from countries / IPs which are not in the white listed list of countries I want to allow access for. I have looked at https://www.ip2location.com/free/visitor-blocker which seems to offer a solution - for the 4 allowed countries I want to allow access - it has created a 4.1MB htaccess file! Will this mean slow access when someone attempts to view the site? I guess using a free service like this means the data is likely nowhere near comprehensive?
Does anyone have any suggestions on a good way to allow just visitors from a few countries access to a website?
It sounds like the service you used basically tried tried to brute force the blacklist. If you look into the htaccess file I'm sure you will be a long list of hard coded IP blocks.
In my opinion this is a terrible way to handle a geographic blacklist. To your original question - there is no "most reliable, most accurate, and quickest" method. Those are separate categories and you will need to preference one over the next.
For performance you could consider blacklisting at the routing level / dns server / proxy. This obviously isn't going to be the quickest way in terms of performance. There are Apache Modules that exist that allow you to use a local database to compare the incoming IP address with a list of known IP blocks from the blacklisted country. One of the main issues with this is that you need to constantly update your database to take in new IP blocks.
In my opinion the "best" method to do this is a simple redirect at the application layer using server side code. There exists several geographic API's where you can send in the IP or Hostname and get back a country of origin. An example:
$xml= new SimpleXMLElement(file_get_contents('http://www.freegeoip.net/xml/{IP_or_hostname}'));
if($xml->CountryCode == "US") {
header('Location: http://www.google.com');
}
There are two ways to block a visitor in web server. One is using firewall (.htaccess etc) and another one is using server-side scripting (PHP etc).
If you are concern of the performance of the firewall option, then you can download the IP2Location LITE database from http://lite.ip2location.com and implement the database in your local server. For every connection, you query the visitor IP address and find their country. You can redirect or block them using the PHP codes. Please find the complete steps in https://www.ip2location.com/tutorials/redirect-web-visitors-by-country-using-php-and-mysql-database
There is also another option to use remote geolocation API. However, we do not suggest this method because of network latency. It will slow down all user experience due to API queries.

I'm being scraped, how can I prevent this?

Running IIS 7, a couple of times a week I see a huge number of hits on Google Analytics from one geographical location. The sequence of urls they are viewing are clearly being generated by some algorithm so I know I'm being scraped for content. Is there any way to prevent this? So frustrated that Google doesn't just give me an IP.
There are plenty of techniques in the anti-scraping world. I'd just categorize them. If you find something missing in my answer please comment.
A. Server side filtering based on web requests
1. Blocking suspicious IP or IPs.
The blocking suspicious IPs works well but today most of scraping is done using IP proxying so for a long run it wouldn't be effective. In your case you get requests from the same IP geo location, so if you ban this IP, the scrapers will surely leverage IP proxying thus staying IP independent and undetected.
2. Using DNS level filtering
Using DNS firewall pertains to the anti-scrape measure. Shortly saying this is to set up you web service to a private domain name servers (DNS) network that will filter and prevent bad requests before they reach your server. This sophisticated measure is provided by some companies for complex website protection and you might get deeper in viewing an example of such a service.
3. Have custom script to track users' statistic and drop troublesome requests
As you've mentioned you've detected an algorithm a scraper crawls urls. Have a custom script that tracks the request urls and based on this turns on protection measures. For this you have to activate a [shell] script in IIS. Side effect might be that the system response timing will increase, slowing down your services. By the way the algorithm that you've detected might be changed thus leaving this measure off.
4. Limit requests frequency
You might set a limitation of the frequency of requests or downloadable data amount. The restrictions must be applied considering the usability for a normal user. When compared to the scraper insistent requests you might set your web service rules to drop or delay unwanted activity. Yet if scraper gets reconfigured to imitate common user behaviour (thru some nowdays well-known tools: Selenuim, Mechanize, iMacros) this measure will fail off.
5. Setting maximum session length
This measure is a good one but usually modern scrapers do perform session authentication thus cutting off session time is not that effective.
B. Browser based identification and preventing
1. Set CAPTCHAs for target pages
This is the old times technique that for most part does solve scraping issue. Yet, if your scraping opponent leverages any of anti-captcha services this protection will most likely be off.
2. Injecting JavaScript logic into web service response
JavaScript code should arrive to client (user's browser or scraping server) prior to or along with requested html content. This code functions to count and return a certain value to the target server. Based on this test the html code might be malformed or might even be not sent to the requester, thus leaving malicious scrapers off. The logic might be placed in one or more JavaScript-loadable files. This JavaScript logic might be applied not just to the whole content but also to only certain parts of site's content (ex. prices). To bypass this measure scrapers might need to turn to even more complex scraping logic (usually of JavaScript) that is highly customizable and thus costly.
C. Content based protection
1. Disguising important data as images
This method of content protection is widely used today. It does prevent scrapers to collect data. Its side effect is that the data obfuscated as images are hidden for search engine indexing, thus downgrading site's SEO. If scrapers leverage a OCR system this kind of protection is again might be bypassed.
2. Frequent page structure change
This is far effective way for scrape protection. It works not just to change elements ids and classes but the entire hierarchy. The latter involving styling restructuring thus imposing additional costs. Sure, the scraper side must adapt to a new structure if it wants to keep content scraping. Not much side effects if your service might afford it.

Running SOAP and RESTful on the same URL

Say we have a website that responds to a host header "kebab-shop.intra.net"
Is is possible to have both SOAP and RESTful in this URL?
That is, both of these are handled within the deployed code.
kebab-shop.intra.net/takeaway.asmx
kebab-shop.intra.net/kebab/get/...
I've been told this can't be done, without much explanation. Like this answer. This could be, I'm a database monkey, but I'd like some thoughts on what options I do or don't have please.
Thoughts so far
Separate host headers eg add kebab-shop-rest.intra.net
Separate web sites
Ideally, I'd like to have one web site, one URL domain name, one host header. Zero chance?
This is IIS 6 with .net 4. And we have some large corporate limitations that mean we are limited to a zip file to drop into the relevant folder used by the web site. This is intended to allow our clients to migrate without incurring the large corporate support, infrastructure and deployment overhead. The co-existence will only be for a month or three.
Edit: I'm asking because I'm not web developer. If my terms are wrong, this is why...
So... I want both SOAP and REST on kebab-shop.intra.net on IIS 6 without complexity.
That is, both of these are handled
within the deployed code.
* kebab-shop.intra.net/takeaway.asmx
* kebab-shop.intra.net/kebab/get/...
Yes, that should definitely be possible. If you have a single WCF service, you could easily expose two separate endpoints for the same service - one using e.g. basicHttpBinding (roughly equivalent to ASMX), and another with webHttpBinding (REST).
The complete URL's must be different - but the first part can be the same, I believe.
If you're hosting in IIS6, you need one virtual directory and that will partly dictate your SOAP endpoint - it will have to be something like:
http://kebab-shop.intra.net/YourVirtDir/takeaway.svc
(or: http://kebab-shop.intra.net/YourVirtDir/takeaway.asmx if you insist on using an ASP.NET legacy webservice).
and the REST endpoint can live inside the same virtual directory and define URI templates, e.g. you could have something like:
http://kebab-shop.intra.net/YourVirtDir/TakeKebab/gbn
or similar URL's.
However: checking this out myself I found that you cannot have both service endpoints "live" off the same base address - one of them has to have another "relative address" associated with it.
So either you add e.g. "SOAP" to your SOAP endpoint
http://kebab-shop.intra.net/YourVirtDir/takeaway.svc/SOAP/GetKebab
http://kebab-shop.intra.net/YourVirtDir/TakeKebab/gbn
or you add something to your REST service
http://kebab-shop.intra.net/YourVirtDir/takeaway.svc/GetKebab
http://kebab-shop.intra.net/YourVirtDir/REST/TakeKebab/gbn
I don't see a reason why you can't. Typìcally your SOAP endpoints will be one specific URLs per service, whereas for resources exposed via REST you will have one URL per resource (following 'URL patterns').
Example URLs for SOAP:
http://kebab-shop.intra.net/soap/service1
http://kebab-shop.intra.net/soap/service2
Example URL patterns for REST:
http://kebab-shop.intra.net/rest/{resourcetype}/{id}/
e.g.: http://kebab-shop.intra.net/rest/monkeys/32/
etc...

Bandwidth Monitoring in asp.net

Hi, We are developing a multi-tenant application in Asp.Net with separate Database for each tenant, in which one of the requirement is to monitor the bandwidth usage for each tenant,
i have tried to search but not found much help on the topic,we want to monitor exactly how much bandwidth is being used for each tenant while each tenant can have its own top level domain or a sub domain or a combination of both.
so what are the available options, the ones which i can think of can be
IIS Log Monitoring means a separate application which will calculate the bandwidth for each tenant.
Log Each Request and Response for a tenant from within the application and then calculate the total bandwidth usage based on that.
Use some third part components if available
So what do you think will be the best approach, also if there is any other way to do this.
Ok, here is an idea (that I have not test, leave that to you)
On global.asax
use one of this function (find the one that have a valid final size)
Application_PostRequestHandlerExecute
Application_ReleaseRequestState
and get the size that you have send with
Response.Filter.Length
No need to metion, that you get the filename of the call using the
HttpContext.Current.Request.Path
This functions called with every single request, so you can get your size and you do the rest.
Here must note, that you need first to test this idea to see if its work, and maybe improve it, and have in mine that if you have compress the pages on server the length is not the correct and maybe you need to compress it on Global.asax to have the actually lenght.
Hope this help.
Well, since the IIS logs already contain the request size and response size, it doesn't seem like too much trouble to develop a small tool to parse them and calculate the total per day/week/month/whatever.
Trying to segment traffic based on host is difficult in my experience. Instead, if you give each tenant their own IP(s) for the applications you should be able to find programs that will monitor bandwidth based on IP.
ADDITION Is the structure of IIS that you have one website to rule them all for all tenants and on login the system forks to the proper database? If so, this may create problems with respect to versioning in that all tenant's sites will all have to have exactly the same schema and would all need to be updated simultaneously when you update the application such that a schema change is required.
Another structure, which sounds like what you may have, is that each tenant has their own website like so:
tenant1_site/appvirtualdir
tenant2_site/appvirtualdir
...
Where the appvirtualdir points to the same physical path for all tenant's sites. When all clients have the same application version, they are all using literally the same code. If you have this scenario and some sort of authentication, then you will need one IP per tenant anyway because of SSL. SSL will only bind to IP and port unlike non-SSL which will bind to IP, port and host. If that were the case, then monitoring traffic based on IP will still be simpler and more accurate as it could be done at the router or via a network monitor.

ASP.NET: Doesnot download Parallel content

In asp.net application, how its possible to download all png,css JavaScript and other resources parallel.
Because i am monitoring using Fiddler and found that content is downloaded one after another.
That is actually more of a browser (client) behaviour in accordance to the specification in HTTP 1.1. The guideline is to limit simultaneous downloads to two per hostname.
http://www.yuiblog.com/blog/2007/04/11/performance-research-part-4/
While you may be able to alter your browser's settings to download more per hostname, that is only your machine and not that of others' in the Internet wilderness. One way to trick clients in downloading more simulatenously is to designate your web resources into different hostnames, like images stored in http://images.yoursite.com. But you may wanna to test this and balance it out, as per the article's suggestion.
You can try AJAX for that as usually there are 5 allowed server/client http connections you could theoretically use them all at once.
However I guess you will take little advantage of this, unless you have really big (or many) css and javascript files.
Not sure if this will work on images or other files.

Resources