Caching with Varnish & Varying over custom-set HTTP headers - http

I'm developing your standard high traffic ecommerce website and want to setup caching with Varnish. The particular thing on this setup is that the application will return different content depending on the user's particular location.
So my plans are these:
Setup Nginx with GeoIP module, so I can get a X-Country: XX header on all the requests going to the app backends.
Configure the Rails application to always return a "Vary: X-Country" response header.
Put the Varnish server behind the Nginx and the app backends, so it can cache multiple versions of the objects served by Rails, and serve them based on the request headers set by Nginx (not the client browser)
Does anyone have experience with a setup like this? Anything I should be aware of?

If GeoIP lookup is slow, and/or you want to enable people to override the country setting, you could use a country cookie and have the front-end Varnish check for it.
If there is no country cookie, forward the request to your nginx back-end for GeoIP lookup. Nginx serves a redirect with a Set-Cookie: country=us header. If you want to avoid redirects and support cookie-refusing clients/robots, ngingx can forward it to Rails and still try to set the country cookie in the response. Or Varnish can capture the redirect response and do a "restart" with the newly set cookie and go to the back-end
If you have already have a country cookie, use this in your Varnish hash
If Rails can do GeoIP resolving, you don't need Ngingx, except when you use it to serve files...

Related

Force nginx cache

We are stuck with a non-trivial question with nginx.
Our web server (Nginx) has a cache enabled for all links.
When we do newsletter campaigns, each customer accesses our website using a unique URL and this means that the nginx cache is not used.
This results in a high load on the website every time we send a newsletter, so we need to deal with this in some way.
An example of link
https://www.example.com/deals/calendarraa?utm_content=calendars_link&xnpe_tifc=hFzDOFnDbuPsOfnDx9pZVjQsVuU_O.bpx.US4FzXxDxZ4.VdxDPuxuYj4nTT&utm_source=newsletter&utm_campaign=deals%2520offer%3A%2520calendar%2520and%2520greeting%2520cards&utm_medium=email
Is it possible to configure nginx so that in case we have params starting with utm_ &/or xnpe_tifc, e.g. we ignore these parameters and nginx serves website with URL without them?
That means: https://www.example.com/deals/calendarra? or even better just https://www.example.com/deals/calendarra

Building URLs in Go including server scheme

I am creating a REST API in Go, and I want to build URLs to other resources in my replies.
Based on the http.Response I can get the Host and URL.
However, how would I go about getting the transport scheme used by the server? http or https?
I attemped to check if server.TLSConfig is nil and then assuming it is using http since it says this in the documentation for http.Server:
TLSConfig *tls.Config // optional TLS config, used by ListenAndServeTLS
But it turns out this exists even when I do not run the server with ListenAndServeTLS.
Or is this way of building my URLs the wrong way of doing things? Is there some other normal way of doing this?
My preferred solution when running http and https is just to run a simple listener on :80 that redirects all traffic to https. Then any real traffic can be assumed to be https.
Alternately I believe you can access a request's URL at req.URL.Scheme to see the protocol.
Or do you mean for the entire application? If you accept configuration to switch between http and https, then can't you look at that and see which they chose? I guess I'm missing some context maybe.
It is also common practice for apps to take a baseURL via flag or config to generate external urls with.

Downsides of 'Access-Control-Allow-Origin: *'?

I have a website with a separate subdomain for static files. I found out that I need to set the Access-Control-Allow-Origin header in order for certain AJAX features to work, specifically fonts. I want to be able to access the static subdomain from localhost for testing as well as from the www subdomain. The simple solution seeems to be Access-Control-Allow-Origin: *. My server uses nginx.
What are the main reasons that you might not want to use a wildcard for Access-Control-Allow-Origin in your response header?
You might not want to use a wildcard when e.g.:
Your web and let’s say its AJAX backend API are running on different domains, or just on different ports and you do not want to expose backend API to whole Internet, then you do not send *. For example your web is on http://www.example.com and backend API on http://api.example.com, then the API would respond with Access-Control-Allow-Origin: http://www.example.com.
If the API wants to request cookies from client it must not send Access-Control-Allow-Origin: *, but its value must be the value of the origin from the actual request.
For testing, actually adding entry in /ets/hosts file for 127.0.0.1/server-public-ip dev.mydomain.com is a decent workaround.
Other way can be to have another domain served by nginx itself like dev.mydomain.com pointing to the same/test-instance of backend servers & static-web-root with some security measures like:
satisfy all;
allow <YOUR-CIDR/IP>;
deny all;
Clarification on: Access-Control-Allow-Origin: *
This setting protects the users of your website from being scammed/hijacked while visiting other evil-websites in a modern-browser which respects this policy (all known browsers should do).
This setting does not protect the webservice from scraper scripts to access your static-assets & APIs at rapid speed - doing bruteforce attacks/bulk downloading/causing load etc.
P.S: (1) For development: you can consider using a free, low-footprint private-p2p vpn-like network b/w your development box & server: https://tailscale.com/
In my opinion, is that you could have other websites consuming your API without your explicit permission.
Imagine you have an e-commerce, another website could do all the transactions using their own look and feel but backed by you, for you, in the end, it is good because you will get the money in the end but your brand will lose its "recognition".
Another problem could be if this website would change the sent payload to your backend doing things like changing the delivery address and other things.
The idea behind is just to not authorize unknown websites to consume your API and show its result to users.
You could use the hosts file to map 127.0.0.1 to your domain name, "dev.mydomain.com", as you do not like to use Access-Control-Allow-Origin: *.

What settings are required to put AWS CloudFront CDN in front of a squarespace website?

I had trouble getting AWS CloudFront to work with SquareSpace. Issues with forms not submitting and the site saying website expired. What are the settings that are needed to get CloudFront working with a Squarespace site?
This is definitely doable, considering I just set this up. Let me share the settings I used on Cloudfront, Squarespace, and Route53 to make it work. If you want to use a different DNS provide than AWS Route53, you should be able to adapt these settings. Keep in mind that this is not an e-commerce site, but a standard site with a blog, static pages, and forms. You can likely adapt these instructions for other issues as/if they come up.
Cloudfront (CDN)
To make this work, you need to create a Cloudfront Distribution for Web.
Origin Settings
Origin Domain Name should be set to ext-cust.squarespace.com. This is Squarespace's entry point for external domain names.
Origin Path can be left blank.
Origin ID is just the unique ID for this distribution and should auto-populate if you're on the distribution creation screen, or be fixed if you're editing Origin Settings later.
Origin Custom Headers do not need to be set.
Default Cache Behavior Settings / Behaviors
Path Patterns should be left at Default.
I have Viewer Protocol Policy set to Redirect HTTP to HTTPS. This dictates whether your site can use one or both of HTTP or HTTPS. I prefer to have all traffic routed securely, so I redirect all HTTP traffic to HTTPS. Note that you cannot do the reverse and redirect HTTPS to HTTP, as this will cause authentication issues (your browser doesn't want to expose what you thought was a secure connection).
Allowed HTTP Methods needs to be GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE. This is because forms (and other things such as comments, probably) use the POST HTTP method to work.
Cached HTTP Methods I left to just GET, HEAD. No need for anything else here.
Forward Headers needs to be set to All or Whitelist. Squarespace's entry point we mentioned earlier needs to know where what domain you're coming from to serve your site, so the Host header must be whitelisted, or allowed with everything else if set to All.
Object Caching, Minimum TTL, Maximum TTL, and Default TTL can all be left at their defaults.
Forward Cookies cookies is the missing component to get forms working. Either you can set this to All, or Whitelist. There are certain session variables that Squarespace uses for validation, security, and other utilities. I have added the following values to Whitelist Cookies: JSESSIONID, SS_MID, crumb, ss_cid, ss_cpvisit, ss_cvisit, test. Make sure to put each value on a separate line, without commas.
Forward Query Strings is set to True, as some Squarespace API calls use query strings so these must be passed along.
Smooth Streaming, Restrict Viewer Access, and Compress Objects Automatically can all be left at their default values, or chosen as required if you know you need them to be set differently.
Distribution Settings / General
Price Class and AWS WAF Web ACL can be left alone.
Alternate Domain Names should list your domain, and your domain with the www subdomain attached, e.g. example.com, www.example.com.
For SSL Certificate, please follow the tutorial here to upload your certificate to IAM if you haven't already, then refresh your certificates (there is a control next to the dropdown for this), select Custom SSL Certificate and select the one you've provisioned. This ensures that browsers recognize your SSL over HTTPS as valid. This is not necessary if you're not using HTTPS at all.
All following settings can be left at default, or chosen to meet your own specific requirements.
Route 53 (DNS)
You need to have a Hosted Zone set up for your domain (this is specific to Route 53 setup).
You need to set an A record to point to your Cloudfront distribution.
You should set a CNAME record for the www subdomain name pointing to your Cloudfront distribution, even if you don't plan on using it (later we'll go through setting Squarespace to only use the root domain by redirecting the www subdomain)
Squarespace
On your Squarespace site, you simply need to go to Settings->Domains->Connect a Third-Party Domain. Once there, enter your domain and continue. Under the domain's settings, you can uncheck Use WWW Prefix if you'd like people accessing your site from www.example.com to redirect to the root, example.com. I prefer this, but it's up to you. Under DNS Settings, the only value you need is CNAME that points to verify.squarespace.com. Add this CNAME record to your DNS settings on Route 53, or other DNS provider. It won't ever say that your connection has been fully completed since we're using a custom way of deploying, but that won't matter.
Your site should now be operating through Cloudfront pointing to your Squarespace deployment! Please note that DNS propogation takes time, so if you're unable to access the site, give it some time (up to several hours) to propogate.
Notes
I can't say exactly whether each and every one of the values set under Whitelist Cookies is necessary, but these are taken from using the Chrome Inspector to determine what cookies were present under the Cookie header in the request. Initially I tried to tell Cloudfront to whitelist the Cookie header itself, but it does not allow that (presumably because it wants you to use the cookie-specific whitelist). If your deployment is not working, see if there are more cookies being transmitted in your requests (under the Cookie header, the values you're looking for should look like my_cookie=somevalue;other_cookie=othervalue—my_cookie and other_cookie in my example are what you'd add to the whitelist).
The same procedure can be used to forward other headers entirely that may be needed via the Forward Headers whitelist. Simply inspect and see if there's something that looks like it might need to go through.
Remember, if you're not whitelisting a header or cookie, it's not getting to Squarespace. If you don't want to bother, or everything is effed (pardon my language), you can always set to allow all headers/cookies, although this adversely affects caching performance. So be conservative if you can.
Hope this helps!
Here are the settings to get CloudFront working with Squarespace!
Behaviours:
Allowed HTTP Methods Ensure that you select: GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE. Otherwise forms will not work:
Forward Headers: Select whitelist and choose 'Host'. Otherwise squarespace will not know which website they need to load up and you get the message 'Website has expired' or similar.
Origins:
Origin Domain Name set as: ext-cust.squarespace.com
Origin Protocol Policy Select HTTPS so that traffic between the CDN and the origin is secure too
General
Alternate Domain Names (CNAMEs) put both your www and none www addresses here and let Squarespace decide on if to direct www to root or vice-versa (.e.g example.com www.example.com)
You can now configure SSL on CloudFront
HTTPS You can now enforce HTTPS using a certificate for your site here rather than in Squarespace
Setting I'm unsure about still:
Forward Query Strings: recommended not for caching reasons but I think this could break things...
Route53
Create A records for www and root (e.g. example.com www.example.com) and set as an alias to your CloudFront distribution

http redirects to https

What would cause a site to try an go to an https url?
We have sitecore set up to redirect non www URLs to www pre-pended URLs. Example: joesrx.com resolves to www.joesrx.com through the Sitecore URLResolver.
What we are seeing is that if you type joesrx.com, it tries to go to https://joesrx.com before it hits the Sitecore server. Since there are no certificates on this server and https is not utilized we get a 404.
Is there something in IIS that is misconfigured? Proxy teams says it is not in their setting and network team says all of the DNS entries are correct.
As a general rule for debugging these sorts of problems, try to imagine all the elements between you and the application and then use a simple divide and conquer approach. You can also test behavior on individual levels of the path between you and the actual application.
In this case for example (from you to application code):
User
Browser
browser may do caching of redirects. Try a different browser / try incognito mode / clear cache
Browser Extensions/Settings
any extensions which make it so the browser always tries to access website(s) via https? Try with extension disabled / another browser
Proxies/Firewalls
any Proxies/Firewalls on your end which may modify requests? Can you try to access the site bypassing any proxies/firewalls, maybe from a different network connection?
Network
Web Server
Web Server Configuration / Pipelines / Resolvers / Filters / Etc.
.htaccess / IIS config / nginx config / servlet filters / (lots of options depending on your framework). Check the server
Actual application code
well.. check the code.
Example of divide and conquer, choosing the Network mid-point: Try accessing the URL with wget/curl from command-line, curl -i will also show you the headers received from the server. If you find a "Location: .." header it's clear that the server is sending a redirect. So now you only have to check Web Server / framework configuration and actual application code.
There are a few things I would check first:
Do you have rewrite rules in your web.config? They may be pattern-matching on www. and redirecting in order to enforce SSL
Do you have code in your pipelines that is attempting to enforce SSL for specific paths? The code here may not be checking the URL correctly.
In IIS, did you bind the 'www' host name to your IIS site? Or is it falling through to another site that has SSL enforced?
In case the other answers don't help, check for HTST headers such as "Strict-Transport-Security: max-age=31536000".
This HTTP header tells browsers to use only SSL for future requests (among other things).
For more info check out:
https://www.owasp.org/index.php/HTTP_Strict_Transport_Security

Resources