Downsides of 'Access-Control-Allow-Origin: *'? - http

I have a website with a separate subdomain for static files. I found out that I need to set the Access-Control-Allow-Origin header in order for certain AJAX features to work, specifically fonts. I want to be able to access the static subdomain from localhost for testing as well as from the www subdomain. The simple solution seeems to be Access-Control-Allow-Origin: *. My server uses nginx.
What are the main reasons that you might not want to use a wildcard for Access-Control-Allow-Origin in your response header?

You might not want to use a wildcard when e.g.:
Your web and let’s say its AJAX backend API are running on different domains, or just on different ports and you do not want to expose backend API to whole Internet, then you do not send *. For example your web is on http://www.example.com and backend API on http://api.example.com, then the API would respond with Access-Control-Allow-Origin: http://www.example.com.
If the API wants to request cookies from client it must not send Access-Control-Allow-Origin: *, but its value must be the value of the origin from the actual request.

For testing, actually adding entry in /ets/hosts file for 127.0.0.1/server-public-ip dev.mydomain.com is a decent workaround.
Other way can be to have another domain served by nginx itself like dev.mydomain.com pointing to the same/test-instance of backend servers & static-web-root with some security measures like:
satisfy all;
allow <YOUR-CIDR/IP>;
deny all;
Clarification on: Access-Control-Allow-Origin: *
This setting protects the users of your website from being scammed/hijacked while visiting other evil-websites in a modern-browser which respects this policy (all known browsers should do).
This setting does not protect the webservice from scraper scripts to access your static-assets & APIs at rapid speed - doing bruteforce attacks/bulk downloading/causing load etc.
P.S: (1) For development: you can consider using a free, low-footprint private-p2p vpn-like network b/w your development box & server: https://tailscale.com/

In my opinion, is that you could have other websites consuming your API without your explicit permission.
Imagine you have an e-commerce, another website could do all the transactions using their own look and feel but backed by you, for you, in the end, it is good because you will get the money in the end but your brand will lose its "recognition".
Another problem could be if this website would change the sent payload to your backend doing things like changing the delivery address and other things.
The idea behind is just to not authorize unknown websites to consume your API and show its result to users.

You could use the hosts file to map 127.0.0.1 to your domain name, "dev.mydomain.com", as you do not like to use Access-Control-Allow-Origin: *.

Related

On Demand TLS and Reverse Proxy Support for Custom Domains

I came into a situation today. Please share your expertise 🙏
I have a project (my-app.com) and one of the features is to generate a status page consisting of different endpoints.
Current Workflow
User login into the system
User creates a status page for one of his sites (e.g.google) and adds different endpoints and components to be included on that page.
System generates a link for a given status page.
For Example. my-app.com/status-page/google
But the user may want to see this page in his custom domain.
For Example. status.google.com
Since this is a custom domain, we need on-demand TLS functionality. For this feature, I used Caddy and is working fine. Caddy is running on our subdomain status.myserver.com and user's custom domain status.google.com has a CNAME to our subdomain status.myserver.com
Besides on-demand TLS, I am also required to do reverse proxy as
shown below.
For Example. status.google.com ->(CNAME)-> status.myserver.com ->(REVERSE_PROXY)-> my-app.com/status-page/google
But Caddy supports only protocol, host, and port format for reverse proxy like my-app.com but my requirement is to support reverse proxy for custom page my-app.com/status-page/google. How can I achieve this? Is there a better alternative to Caddy or a workaround with Caddy?
You're right, since you can't use a path in a reverse-proxy upstream URL, you'd have to do rewrite the request to include the path first, before initiating the reverse-proxy.
Additionally, upstream addresses cannot contain paths or query strings, as that would imply simultaneous rewriting the request while proxying, which behavior is not defined or supported. You may use the rewrite directive should you need this.
So you should be able to use an internal caddy rewrite to add the /status-page/google path to every request. Then you can simply use my-app.com as your Caddy reverse-proxy upstream. This could look like this:
https:// {
rewrite * /status-page/google{path}?{query}
reverse_proxy http://my-app.com
}
You can find out more about all possible Caddy reverse_proxy upstream addresses you can use here: https://caddyserver.com/docs/caddyfile/directives/reverse_proxy#upstream-addresses
However, since you probably can't hard-code the name of the status page (/status-page/google) in your Caddyfile, you could set up a script (e.g. at /status-page) which takes a look at the requested URL, looks up the domain (e.g. status.google.com) in your database, and automatically outputs the correct status-page.

How does CORS (Access-Control-Allow-Origin header) increase security?

I'm doing some work with this right now and I have to say, it makes no sense at all to me! Basically, I have some CDN server which provides css, images ect for a site. For whatever reason, in order for my browser to stop blocking those resources with a CORS error, I had to have that server (the CDN) add the Access-Control-Allow-Origin header. But as far as I can tell that does absolutely nothing to increase security. Shouldn't the page I request which references those cross-domain resources be telling the browser it's safe to get stuff from the other domain? If that were a malicious domain wouldn't it just have the Access-Control-Allow-Origin set to * so that sites load their malicious responses (you don't have to answer that because obviously they would)?
So can someone explain how this mechanism/feature provides security? As far as I can tell the implementors fucked up and it actually does nothing. The header should be required from the page which references/requests cross-domain resources rather than from that domain being requested.
To be clear; if I request a page at domain A it would make sense for the response to include the Access-Control-Allow-Origin header white listing resources from domain B (Access-Control-Allow-Origin:.B.com), however it makes no sense at all for domain B to effectively white list itself by providing the header; Access-Control-Allow-Origin: which is how this is currently implemented. Can anyone clarify what the benefit of this feature is?
If I have a protected resource hosted on site A, but also control sites B, C, and D, I may want to use that resource on all of my sites but still prevent anyone else from using that resource on theirs. So I instruct my site A to send Access-Control-Allow-Origin: B, C, D along with all of its responses. It's up to the web browser itself to honor this and not serve the response to the underlying Javascript or whatever initiated the request if it didn't come from an allowed origin. Error handlers will be invoked instead. So it's really not for your security as much as it's an honor-system (all major browsers do this) access control method for servers.
Primarily Access-Control-Allow-Origin is about protecting data from leaking from one server (lets call it privateHomeServer.com) to another server (lets call it evil.com) via an unsuspecting user's web browser.
Consider this scenario:
You are on your home network browsing the web when you accidentally stumble onto evil.com. This web page contains malicious javascript that tries to look for web servers on your local home network and then sends their content back to evil.com. It does this by trying to open XMLHttpRequests on all local IP addresses (eg. 192.168.1.1, 192.168.1.2, .. 192.168.1.255) until it finds a web server.
If you are using an old web browser that isn't Access-Control-Allow-Origin aware or you have set Access-Control-Allow-Origin * on your privateHomeServer then your browser would happily retrieve the data from your privateHomeServer (which presumably you didn't bother passwording as it was safely behind your home firewall) and then handing that data to the malicious javascript which can then send the information on to the evil.com server.
On the other hand using an Access-Control-Allow-Origin aware browser and default web configuration on privateHomeServer (ie. not sending Access-Control-Allow-Origin *) your web browser would block the malicious javascript from seeing any data retrieved from privateHomeServer. So this way you are protected from such attacks unless you go out of your way to change the default configuration on your server.
Regarding the question:
Shouldn't the page I request which references those cross-domain
resources be telling the browser it's safe to get stuff from the other
domain?
The fact that your page contains code that is attempting to get resources from a particular server is implicitly telling the web browser that you believe the resources are safe to fetch. It wouldn't make sense to need to repeat this again elsewhere.
CORS makes only sense for Mashup content provider and nothing more.
Example: You are a provider of a embedded maps mashup service which requires a registration. Now you want to make sure that your ajax mashup map will only work for your registered users on their domains. Other domains should be excluded. Only for this reason CORS makes sense.
Another example: Someone misuse CORS for a REST-Service. The clever developer set up a ajax proxy and et voilĂ  you can access from every domain on that service.
Such a ajax proxy would make no sense for a mashup, on the other way the CORS makes no sense for REST-Services, because you could bypass the restriction with a simple http-client.

What settings are required to put AWS CloudFront CDN in front of a squarespace website?

I had trouble getting AWS CloudFront to work with SquareSpace. Issues with forms not submitting and the site saying website expired. What are the settings that are needed to get CloudFront working with a Squarespace site?
This is definitely doable, considering I just set this up. Let me share the settings I used on Cloudfront, Squarespace, and Route53 to make it work. If you want to use a different DNS provide than AWS Route53, you should be able to adapt these settings. Keep in mind that this is not an e-commerce site, but a standard site with a blog, static pages, and forms. You can likely adapt these instructions for other issues as/if they come up.
Cloudfront (CDN)
To make this work, you need to create a Cloudfront Distribution for Web.
Origin Settings
Origin Domain Name should be set to ext-cust.squarespace.com. This is Squarespace's entry point for external domain names.
Origin Path can be left blank.
Origin ID is just the unique ID for this distribution and should auto-populate if you're on the distribution creation screen, or be fixed if you're editing Origin Settings later.
Origin Custom Headers do not need to be set.
Default Cache Behavior Settings / Behaviors
Path Patterns should be left at Default.
I have Viewer Protocol Policy set to Redirect HTTP to HTTPS. This dictates whether your site can use one or both of HTTP or HTTPS. I prefer to have all traffic routed securely, so I redirect all HTTP traffic to HTTPS. Note that you cannot do the reverse and redirect HTTPS to HTTP, as this will cause authentication issues (your browser doesn't want to expose what you thought was a secure connection).
Allowed HTTP Methods needs to be GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE. This is because forms (and other things such as comments, probably) use the POST HTTP method to work.
Cached HTTP Methods I left to just GET, HEAD. No need for anything else here.
Forward Headers needs to be set to All or Whitelist. Squarespace's entry point we mentioned earlier needs to know where what domain you're coming from to serve your site, so the Host header must be whitelisted, or allowed with everything else if set to All.
Object Caching, Minimum TTL, Maximum TTL, and Default TTL can all be left at their defaults.
Forward Cookies cookies is the missing component to get forms working. Either you can set this to All, or Whitelist. There are certain session variables that Squarespace uses for validation, security, and other utilities. I have added the following values to Whitelist Cookies: JSESSIONID, SS_MID, crumb, ss_cid, ss_cpvisit, ss_cvisit, test. Make sure to put each value on a separate line, without commas.
Forward Query Strings is set to True, as some Squarespace API calls use query strings so these must be passed along.
Smooth Streaming, Restrict Viewer Access, and Compress Objects Automatically can all be left at their default values, or chosen as required if you know you need them to be set differently.
Distribution Settings / General
Price Class and AWS WAF Web ACL can be left alone.
Alternate Domain Names should list your domain, and your domain with the www subdomain attached, e.g. example.com, www.example.com.
For SSL Certificate, please follow the tutorial here to upload your certificate to IAM if you haven't already, then refresh your certificates (there is a control next to the dropdown for this), select Custom SSL Certificate and select the one you've provisioned. This ensures that browsers recognize your SSL over HTTPS as valid. This is not necessary if you're not using HTTPS at all.
All following settings can be left at default, or chosen to meet your own specific requirements.
Route 53 (DNS)
You need to have a Hosted Zone set up for your domain (this is specific to Route 53 setup).
You need to set an A record to point to your Cloudfront distribution.
You should set a CNAME record for the www subdomain name pointing to your Cloudfront distribution, even if you don't plan on using it (later we'll go through setting Squarespace to only use the root domain by redirecting the www subdomain)
Squarespace
On your Squarespace site, you simply need to go to Settings->Domains->Connect a Third-Party Domain. Once there, enter your domain and continue. Under the domain's settings, you can uncheck Use WWW Prefix if you'd like people accessing your site from www.example.com to redirect to the root, example.com. I prefer this, but it's up to you. Under DNS Settings, the only value you need is CNAME that points to verify.squarespace.com. Add this CNAME record to your DNS settings on Route 53, or other DNS provider. It won't ever say that your connection has been fully completed since we're using a custom way of deploying, but that won't matter.
Your site should now be operating through Cloudfront pointing to your Squarespace deployment! Please note that DNS propogation takes time, so if you're unable to access the site, give it some time (up to several hours) to propogate.
Notes
I can't say exactly whether each and every one of the values set under Whitelist Cookies is necessary, but these are taken from using the Chrome Inspector to determine what cookies were present under the Cookie header in the request. Initially I tried to tell Cloudfront to whitelist the Cookie header itself, but it does not allow that (presumably because it wants you to use the cookie-specific whitelist). If your deployment is not working, see if there are more cookies being transmitted in your requests (under the Cookie header, the values you're looking for should look like my_cookie=somevalue;other_cookie=othervalue—my_cookie and other_cookie in my example are what you'd add to the whitelist).
The same procedure can be used to forward other headers entirely that may be needed via the Forward Headers whitelist. Simply inspect and see if there's something that looks like it might need to go through.
Remember, if you're not whitelisting a header or cookie, it's not getting to Squarespace. If you don't want to bother, or everything is effed (pardon my language), you can always set to allow all headers/cookies, although this adversely affects caching performance. So be conservative if you can.
Hope this helps!
Here are the settings to get CloudFront working with Squarespace!
Behaviours:
Allowed HTTP Methods Ensure that you select: GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE. Otherwise forms will not work:
Forward Headers: Select whitelist and choose 'Host'. Otherwise squarespace will not know which website they need to load up and you get the message 'Website has expired' or similar.
Origins:
Origin Domain Name set as: ext-cust.squarespace.com
Origin Protocol Policy Select HTTPS so that traffic between the CDN and the origin is secure too
General
Alternate Domain Names (CNAMEs) put both your www and none www addresses here and let Squarespace decide on if to direct www to root or vice-versa (.e.g example.com www.example.com)
You can now configure SSL on CloudFront
HTTPS You can now enforce HTTPS using a certificate for your site here rather than in Squarespace
Setting I'm unsure about still:
Forward Query Strings: recommended not for caching reasons but I think this could break things...
Route53
Create A records for www and root (e.g. example.com www.example.com) and set as an alias to your CloudFront distribution

How do you disallow crawling on origin server and yet have the robots.txt propagate properly?

I've come across a rather unique issue. If you deal with scaling large sites and work with a company like Akamai, you have origin servers that Akamai talks to. Whatever you serve to Akamai, they will propagate on their cdn.
But how do you handle robots.txt? You don't want Google to crawl your origin. That can be a HUGE security issue. Think denial of service attacks.
But if you serve a robots.txt on your origin with "disallow", then your entire site will be uncrawlable!
The only solution I can think of is to serve a different robots.txt to Akamai and to the world. Disallow to the world, but allow to Akamai. But this is very hacky and prone to so many issues that I cringe thinking about it.
(Of course, origin servers shouldn't be viewable to the public, but I'd venture to say most are for practical reasons...)
It seems an issue the protocol should be handling better. Or perhaps allow a site-specific, hidden robots.txt in the Search Engine's webmaster tools...
Thoughts?
If you really want your origins not to be public, use a firewall / access control to restrict access for any host other than Akamai - it's the best way to avoid mistakes and it's the only way to stop the bots & attackers who simply scan public IP ranges looking for webservers.
That said, if all you want is to avoid non-malicious spiders, consider using a redirect on your origin server which redirects any requests which don't have a Host header specifying your public hostname to the official name. You generally want something like that anyway to avoid issues with confusion or search rank dilution if you have variations of the canonical hostname. With Apache this could use mod_rewrite or even a simple virtualhost setup where the default server has RedirectPermanent / http://canonicalname.example.com/.
If you do use this approach, you could either simply add the production name to your test systems' hosts file when necessary or also create and whitelist an internal-only hostname (e.g. cdn-bypass.mycorp.com) so you can access the origin directly when you need to.

serving images from one domain for multiple websites

we have nearly 13 domains within our company and we would like to serve images from one application in order to leverage caching.
for example, we will have c1.example.com and we will put all of our product images under this application. but here I have some doubts;
1- how can I force client browser's to cache the image and do not request it again?
2- when I reference those images on my application, I will use following html markup;
<img scr="http://c1.example.com/core/img1.png" />
but this causes a problem when I run the website under https. It gives warning about the page. It should have been used https//c1.example.com/core/img1.png when I run my apps under https. what should I do here? should I always use https? or is there a way to switch between auto?
I will run my apps under IIS 7.
Yes you need to serve all resources over https when the html-page is served over https. Thats the whole point of using https.
If the hrefs are hardcoded in the html one solution could be to use a Response Filter that will parse all content sent to the client and replace http with https when necessary. A simple Regular Expression should do the trick. There are plenty of articles out there about how these filters are working.
About caching you need to send the correct cache-headers and etag. There are several of questions and answers on this on SO like this one IIS7 Cache-Control
You need to use HTTP headers to tell the browser how to cache. It should work by default (assuming you have no query string in your URLs) but if not, here's a knowledge base article about the cache-control header:
http://support.microsoft.com/kb/247404
I really don't know much about IIS, so I'm not sure if there are any other potential pitfalls. Note that browsers may still send HEAD requests sometimes.
I'd recommend you setup the image server so that HTTP/S is interchangeable, then just serve HTTPS Urls from HTTPS requests.

Resources