How does a URL qualify to be called a CDN - cdn

I would like to build my own Content Distrobution Network and I have come across a blog post showing how to do it with one of Google's services. But I can't understand it. Here's a few more questions.
Are images hosted in Flickr considered to be "in a CDN"?
If I create a subdomain within my domain, put directories for files in there, and link from my site to that subdomain, is that considered a CDN?

from wikipedia:
A content delivery network or content distribution network (CDN) is a system of computers containing copies of data, placed at various points in a network so as to maximize bandwidth for access to the data from clients throughout the network. A client accesses a copy of the data near to the client, as opposed to all clients accessing the same central server so as to avoid bottleneck near that server.
I suppose that once you replicate content across some computers around the world, any url should be fine.

A content distrobution network (CDN) is not just another place on your website. A CDN is a way of distributing content to different geographical/network locations.
CDNs receive requests based on DNS redirection at the client ISP's side of the request.

Related

What are the potential risks of not using a Web Application Firewall?

I develop and manage a small promotional/marketing website on Wordpress for a startup SaaS product. We're using Cloudflare for DNS and whatnot. Apparently the WAF has been turned on which uses a proxy and changes the user's IP address. i'm trying to use IP address to filter "internal" traffic for Google Analytics and the only way this works is with the WAF turned off. If not using the WAF is going to cause any sort of significant risk for my website, then obviously I'll need another way to do my analytics thing. Reading about what all it provides on their website doesn't make it all that clear to me how important it is for a website like this. If anyone who "gets it" had some insight to share, I'd be most appreciative. thx!
You should definitely use the WAF - it will protect your website from many malicious bots and attacks.
Wordpress sites are particularly juicy targets for attackers, for a number of reasons:
The security of a default Wordpress installation is not great.
Every Wordpress site shares common default features, such as the location of the admin login page, the admin username, and other exploitative resources.
Wordpress is extremely popular, and currently used by an estimated third of all websites on the internet.
Wordpress is used by many, many small businesses and hobbyists who do not how to secure their site properly.
Ergo, attackers can very easily scour the web for Wordpress websites that are easily hackable. Other nefarious activities are commonly carried out with ease on most Wordpress sites, such as comment spam or Denial of Service attacks.
What protection does the WAF offer?
Cloudflare and most other high quality WAFs can be configured to protect your site by automatically performing actions like:
Blocking known bad IP addresses.
Blocking bad bots which are automatically making requests to your site.
Limiting high numbers of requests from one source in a short amount of time (usually a sign of a DoS attack or scraping).
Blocking requests from particular countries or locations.
There is no reason why you wouldn't want to enable this protection if you have it available to you, and Cloudflare is the industry leader in this area.
Additionally, I would recommend you research how to better secure your Wordpress site in ways other than just the WAF - e.g. The Ultimate WordPress Security Guide
How to solve the IP address issue
Cloudflare is not changing the user's (the client) IP address, but rather acting as a proxy. As you have noticed, the IP address you're seeing is not the client's own, but one of Cloudflare's. This is crucial to how Cloudflare works to protect your site, but this is a common issue when using any kind of proxy.
To get the correct IP address when using a proxy, you need to check the X-FORWARDED-FOR header. You might see this as a string of comma-separated IP addresses, depending on how many proxies the user has gone through before reaching the site. The first one in the list is the original client IP.
e.g. Here 203.0.113.1 is the client's original IP address:
X-Forwarded-For: 203.0.113.1,198.51.100.101,198.51.100.102
Documentation: How does Cloudflare handle HTTP Request headers?
Anyway, it's good to use a function which can comprehensively check headers and give you the best match for the original client IP, regardless of whether the user is behind a proxy or not, so that you can guarantee it always works.
Here's a very popular StackOverflow question about this:
What is the most accurate way to retrieve a user's correct IP address in PHP?

Two Webapps have the same IP address

I have a webapp setup with Wordpress with a specific IP address (which is also pointed towards a custom domain).
The problem is, when I add a new webapp (also with Wordpress), it also gets allocated the same IP address as the first webapp causing it to redirect to the first webapp.
I have setup the second webapp with the same subscription plan and am using the same database for both.
Also, the first time I made a second (ever) webapp, it has its own seperate IP, but due to some issue, I deleted it and made a new webapp with the same name. Now whatever I do and no matter how many new webapps I make, they all have the allocated IP the same as the first webapp. Any solutions?
Thanks!
Azure Web Apps are created behind a set of load balancers that differentiate between Web Apps based on the incoming request.
if your two Web Apps are located at example.com and example.org and you have configured both in DNS to point to the same IP address, then the load balancers at the front should decide where to send the request based on what is requested.
This is going to be a problem of using the same backend database for two different wordpress sites. (unfortunately I'm not a wordpress expert, so I can't comment on what that might be - but this answer will hopefully help those who do know about wordpress, to clear up that this is not likely to be an Azure issue)
As Michael indicated the Azure Web Apps site behind Load Balancer or ARR Front ends. However there is more to this.
When you create a site in Web App, you actually create an App Service Plan as well (this corresponds to a VM)
So when you create the second site you will get an option to either choose the same or a new app service plan.
If you choose the same app service plan, then both of your sites will sit on the same VM and as a result will be behind the same ARR FE.
If you choose to create a new app service plan, there might be a possibility that the VM will be allocated either behind the same or a different Front end. This cannot be controlled. The Fabric controller makes the allocation based on availability.
Eitherways, this shouldn't be a problem. It is okay for 2 sites to share the IP Addresses. However if you wish to have separate IP Addresses for your sites, then you can use one of the options:
Create the site in a different data centre
Create the second site under new app service plan. There is a high chance that the app service plan might be allocated under a different ARR FE.
Scale the site to Standard or higher tier and use IP Based SSL. This will allocate a dedicated IP for your site. There is additional cost associated with getting a dedicated IP. Refer the Azure App Service pricing for this.

What is the advantage of using proxy in network for accessing internet?

My college has different proxies for accessing Internet like 192.168.0.2/3/4 and also a specific port number.What is the advantage of using this ? I also would like to know what exactly happens there.I also heard that my institution has different ISP connections shared over the same network. What is the role of proxy there?
It will be very easy to know if you understand what proxies do and why they are used generally. Which could be found on a magical website called www.google.com. By using a proxy, you get more control over the network because all request go through there.Your school may want to do stuffs like traffic shaping, content filtering etc. Using the proxy server will make sure all request to the internet are routed there first.
Proxies are good for a few things:
Filtering. By using a proxy, your college can filter out viruses, porn, Facebook or torrent downloads.
Logging. By requiring a username and password, the college can track what you do with your internet time, and can tell you off if you go somewhere you shouldn't or help you be allowing them to do traffic shaping, or other network maintenance.
Line Bonding. For example, if you have two ADSL lines of 5Mb, you can bond those to get a 10Mb line (normally this is done at the gateway stage, and not the proxy, but it is possible to do it at this stage of the network)
Failover. Again, this would normally be done at the gateway/router stage. This detects which lines are active and routes your traffic to those lines.
Network Connectivity. If your college is in-turn part of a bigger academic network, this could allow crossing those network boundaries to get internet access.
Although those are valid possibilities, it's probably just for Filtering...
In the wider internet, proxies are in use for allowing access to blocked content - like giving China access to Google...

Would implementing a CDN involve moving images and changing path names?

I'm just learning about CDNs, so please forgive if this is a dumb question.
Would implementing a CDN involve moving images and changing paths?
Yes a CDN (Content Delivery Network) is at it basis nothing more that a set of webservers.
If you want to host files on a CDN you must copy your files to the CDN servers and then use the full CDN address that points to those files on those servers on your own webpage.
You can use a CDN on the same server but different URI. For instance, having your page in: www.example.com with cdn: cdn.example.com (with cdn.example.com as a vhost alias) should be faster then getting all data only from www.example.com, i think it's because of the number of http connections related to the address.
Of course it's best if you have it in another server, in this case you have to copy everything.
Not necessarily. You can use a service such as CloudFlare which requires only a modification of some of your DNS settings. In short, the service determines which files being served are static, and caches those in its network, generally reducing overall traffic to your servers. You also get the benefit of any geographical distribution the service provides that your own hosting service might not.

Redirecting http traffic to another server temporarily

Assume you have one box (dedicated server) that's on 24 7 and several other boxes that are user machines that have unused bandwidth. Assume you want to host several web pages. How can the dedicated server redirect http traffic to the user machines. It is desirable that the address field in the web browser still displays the right address, and not an ip. Ie. I don't want to redirect to another web page, I want to tell the web browser that it should request the same web page from a different server. I have been browsing through the 3xx codes, and I don't think they are made for anything like this.
It should work some what along these lines:
1. Dedicated server is online all the time.
2. User machine starts and tells the dedicated server that it's online.
(several other user machines can do similarly)
3. Web browser looks up domain name and finds out that it points to dedicated server.
4. Web browser requests page.
5. Dedicated server tells web browser to repeat request to user machine
Is it possible to use some kind of redirect, and preferably tell the browser to keep sending further requests to user machine. The user machine can close down at almost any point of time, but it is assumed that the user machine will wait for ongoing transactions to finish, no closing the server program in the middle of a get or something.
What you want is called a Proxy server or load balancer that would sit in front of your web server.
The web browser would always talk to the load balancer, and the load balancer would forward the request to one of several back-end servers. No redirect is needed on the client side, as the client always thinks it is just talking to the load balancer.
ETA:
Looking at your various comments and re-reading the question, I think I misunderstood what you wanted to do. I was thinking that all the machines serving content would be on the same network, but now I see that you are looking for something more like a p2p web server setup.
If that's the case, using DNS and HTTP 30x redirects would probably be what you need. It would probably look something like this:
Your "master" server would serve as an entry point for the app, and would have a well known host name, e.g. "www.myapp.com".
Whenever a new "user" machine came online, it would register itself with the master server and a the master server would create or update a DNS entry for that user machine, e.g. "user123.myapp.com".
If a request came to the master server for a given page, e.g. "www.myapp.com/index.htm", it would do a 302 redirect to one of the user machines based on whatever DNS entry it had created for that machine - e.g. redirect them to "user123.myapp.com/index.htm".
Some problems I see with this approach:
First, Once a user gets redirected to a user machine, if the user machine went offline it would seem like the app was dead. You could avoid this by having all the links on every page specifically point to "www.myapp.com" instead of using relative links, but then every single request has to be routed through the "master server" which would be relatively inefficient.
You could potentially solve this by changing the DNS entry for a user machine when it goes offline to point back to the master server, but that wouldn't work without an extremely short TTL.
Another issue you'll have is tracking sessions. You probably wouldn't be able to use sessions very effectively with this setup without a shared session state server of some sort accessible by all the user machines. Although cookies should still work.
In networking, load balancing is a technique to distribute workload evenly across two or more computers, network links, CPUs, hard drives, or other resources, in order to get optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy. The load balancing service is usually provided by a dedicated program or hardware device (such as a multilayer switch or a DNS server).
and more interesting stuff in here
apart from load balancing you will need to set up more or less similar environment on the "users machines"
This sounds like 1 part proxy, 1 part load balancer, and about 100 parts disaster.
If I had to guess, I'd say you're trying to build some type of relatively anonymous torrent... But I may be wrong. If I'm right, HTTP is entirely the wrong protocol for something like this.
You could use dns, off the top of my head, you could setup a hostname for each machine that is going to serve users:
www in A xxx.xxx.xxx.xxx # ip address of machine 1
www in A xxx.xxx.xxx.xxx # ip address of machine 2
www in A xxx.xxx.xxx.xxx # ip address of machine 3
Then as others come online, you could add then to the dns entries:
www in A xxx.xxx.xxx.xxx # ip address of machine 4
Only problem is you'll have to lower the time to live (TTL) entry for each record down to make it smaller (I think the default is 86400 - 1 day)
If a machine does down, you'll have to remove the dns entry, though I do think this is the least intensive way of adding capacity to any website. Jeff Attwood has more info here: is round robin dns good enough?

Resources