and its URL is 'secured' with SSL (with httpS://mywebsite.nl).
However, I found out that, for a long time, at Google Analytics, I use http://mywebsite.nl, ('non-secured') at my property and view's 'Default URL'.
I have two questions:
Did I miss data because I used http instead of https in the property and view's Default URL?
Can I CHANGE the http to httpS (in Google Analytics property/view) without problem, or do I lose historical data because of that? (This probably also depends on answer of Q1...) Or should I ADD a new property and/or view with https Default URL?
Thanks!
you didn't
you don't lose the historical data, feel free to change it.
That "default url" is for your convenience. you can do anything with it. That's just what GA uses to form full URLs from page paths only. Instead of using the hostname dimension there.
Also, GA is gracious enough to warn you whenever you can do significant changes to your core data.
I had trouble getting AWS CloudFront to work with SquareSpace. Issues with forms not submitting and the site saying website expired. What are the settings that are needed to get CloudFront working with a Squarespace site?
This is definitely doable, considering I just set this up. Let me share the settings I used on Cloudfront, Squarespace, and Route53 to make it work. If you want to use a different DNS provide than AWS Route53, you should be able to adapt these settings. Keep in mind that this is not an e-commerce site, but a standard site with a blog, static pages, and forms. You can likely adapt these instructions for other issues as/if they come up.
Cloudfront (CDN)
To make this work, you need to create a Cloudfront Distribution for Web.
Origin Settings
Origin Domain Name should be set to ext-cust.squarespace.com. This is Squarespace's entry point for external domain names.
Origin Path can be left blank.
Origin ID is just the unique ID for this distribution and should auto-populate if you're on the distribution creation screen, or be fixed if you're editing Origin Settings later.
Origin Custom Headers do not need to be set.
Default Cache Behavior Settings / Behaviors
Path Patterns should be left at Default.
I have Viewer Protocol Policy set to Redirect HTTP to HTTPS. This dictates whether your site can use one or both of HTTP or HTTPS. I prefer to have all traffic routed securely, so I redirect all HTTP traffic to HTTPS. Note that you cannot do the reverse and redirect HTTPS to HTTP, as this will cause authentication issues (your browser doesn't want to expose what you thought was a secure connection).
Allowed HTTP Methods needs to be GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE. This is because forms (and other things such as comments, probably) use the POST HTTP method to work.
Cached HTTP Methods I left to just GET, HEAD. No need for anything else here.
Forward Headers needs to be set to All or Whitelist. Squarespace's entry point we mentioned earlier needs to know where what domain you're coming from to serve your site, so the Host header must be whitelisted, or allowed with everything else if set to All.
Object Caching, Minimum TTL, Maximum TTL, and Default TTL can all be left at their defaults.
Forward Cookies cookies is the missing component to get forms working. Either you can set this to All, or Whitelist. There are certain session variables that Squarespace uses for validation, security, and other utilities. I have added the following values to Whitelist Cookies: JSESSIONID, SS_MID, crumb, ss_cid, ss_cpvisit, ss_cvisit, test. Make sure to put each value on a separate line, without commas.
Forward Query Strings is set to True, as some Squarespace API calls use query strings so these must be passed along.
Smooth Streaming, Restrict Viewer Access, and Compress Objects Automatically can all be left at their default values, or chosen as required if you know you need them to be set differently.
Distribution Settings / General
Price Class and AWS WAF Web ACL can be left alone.
Alternate Domain Names should list your domain, and your domain with the www subdomain attached, e.g. example.com, www.example.com.
For SSL Certificate, please follow the tutorial here to upload your certificate to IAM if you haven't already, then refresh your certificates (there is a control next to the dropdown for this), select Custom SSL Certificate and select the one you've provisioned. This ensures that browsers recognize your SSL over HTTPS as valid. This is not necessary if you're not using HTTPS at all.
All following settings can be left at default, or chosen to meet your own specific requirements.
Route 53 (DNS)
You need to have a Hosted Zone set up for your domain (this is specific to Route 53 setup).
You need to set an A record to point to your Cloudfront distribution.
You should set a CNAME record for the www subdomain name pointing to your Cloudfront distribution, even if you don't plan on using it (later we'll go through setting Squarespace to only use the root domain by redirecting the www subdomain)
Squarespace
On your Squarespace site, you simply need to go to Settings->Domains->Connect a Third-Party Domain. Once there, enter your domain and continue. Under the domain's settings, you can uncheck Use WWW Prefix if you'd like people accessing your site from www.example.com to redirect to the root, example.com. I prefer this, but it's up to you. Under DNS Settings, the only value you need is CNAME that points to verify.squarespace.com. Add this CNAME record to your DNS settings on Route 53, or other DNS provider. It won't ever say that your connection has been fully completed since we're using a custom way of deploying, but that won't matter.
Your site should now be operating through Cloudfront pointing to your Squarespace deployment! Please note that DNS propogation takes time, so if you're unable to access the site, give it some time (up to several hours) to propogate.
Notes
I can't say exactly whether each and every one of the values set under Whitelist Cookies is necessary, but these are taken from using the Chrome Inspector to determine what cookies were present under the Cookie header in the request. Initially I tried to tell Cloudfront to whitelist the Cookie header itself, but it does not allow that (presumably because it wants you to use the cookie-specific whitelist). If your deployment is not working, see if there are more cookies being transmitted in your requests (under the Cookie header, the values you're looking for should look like my_cookie=somevalue;other_cookie=othervalue—my_cookie and other_cookie in my example are what you'd add to the whitelist).
The same procedure can be used to forward other headers entirely that may be needed via the Forward Headers whitelist. Simply inspect and see if there's something that looks like it might need to go through.
Remember, if you're not whitelisting a header or cookie, it's not getting to Squarespace. If you don't want to bother, or everything is effed (pardon my language), you can always set to allow all headers/cookies, although this adversely affects caching performance. So be conservative if you can.
Hope this helps!
Here are the settings to get CloudFront working with Squarespace!
Behaviours:
Allowed HTTP Methods Ensure that you select: GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE. Otherwise forms will not work:
Forward Headers: Select whitelist and choose 'Host'. Otherwise squarespace will not know which website they need to load up and you get the message 'Website has expired' or similar.
Origins:
Origin Domain Name set as: ext-cust.squarespace.com
Origin Protocol Policy Select HTTPS so that traffic between the CDN and the origin is secure too
General
Alternate Domain Names (CNAMEs) put both your www and none www addresses here and let Squarespace decide on if to direct www to root or vice-versa (.e.g example.com www.example.com)
You can now configure SSL on CloudFront
HTTPS You can now enforce HTTPS using a certificate for your site here rather than in Squarespace
Setting I'm unsure about still:
Forward Query Strings: recommended not for caching reasons but I think this could break things...
Route53
Create A records for www and root (e.g. example.com www.example.com) and set as an alias to your CloudFront distribution
Our website is a vertical search engine and we refer a lot of traffic offsite to partners sites.
We recently switched our website over to serve all traffic via HTTPS. We realised this might confuse some of our partners if they were looking at referrer stats and saw a drop in traffic attributed to us. Therefore at the same time, we added the content-security-policy:referrer origin header and we can see that the referrer is correctly passed along by the browser.
Generally this is working fine but we have had complaints from users of Adobe SiteCatalyst (previously Omniture) who are no longer able to attribute traffic as being referred from us. We don't have access to SiteCatalyst to test this out. How does SiteCatalyst track referral traffic and is there a way to view all traffic split by different sources/referrers?
I don't know if this accounts for everything, since I don't have full context on both your end or your users' end, but here is some info / thoughts that might help.
By default, Adobe Analytics tracks referrer from document.referrer. This can be overridden by setting s.referrer.
In general, depending on how your site directs visitors to the other site vs. Browser security/privacy settings, document.referrer may or may not have a value. For example, Internet Explorer's default security/privacy settings is to suppress document.referrer on dynamically generated popup windows (e.g. window.open() calls).
So, and again, this is just speculation because I don't know the full context, you may need to work something out w/ your users, e.g. explicitly passing the referring url as a query param to the target page, and have your users pop s.referrer with it if it exists. Something along the lines of:
if ( !document.referrer ) {
s.referrer=s.Util.getQueryParam( 'refURL' );
}
Note: s.Util.getQueryParam is a utility function for Adobe Analytics AppMeasurement library that will return the value of the specified query param, or an empty string if it doesn't exist. If your users are still using legacy H code, they should use the s.getQueryParam plugin instead. Or use whatever homebrewed method of getting a query param from the URL, since javascript doesn't have a built-in function for it.
I have several websites that get daily around 5% of visits from spam referrers. There is one strange things I noticed about this referrers: they show in Google Analytics, but I cannot see them in my custom designed table where I insert all the visitors to the site, so I think that they only manipulate the GA code, never reaching the site itself.
If you follow their link, they redirect you to some affiliates link.
I don't know whether they have impact on my SEO/SERP, but I would like to get rid of them. May I do that via htaccess file?
One peculiar aspect is that I get visitors from different forum like pages. E.g.: forum.topic221122.darodar.com, forum.topic125512.darodar.com etc., so I would like to block the full darodar.com domain.
Besides darodar.com, there are also econom.co and iloveitaly.co that are bothering my stats. Can I block them all from htaccess?
Most of the Spam in Google Analytics never access your site so you can't block them using any server-side solution.
Ghost Spam hits directly GA and usually shows up only for a few days and then disappear, that's why some people think they blocked them from the .htaccess file but is just coincidence.
This type of Spam is easy to spot since they use either a fake hostname or is not set. (See image below)
The other type, Crawlers like semalt, actually access your site and can be blocked from the .htaccess file, however, there are just a few of them.
So in summary, to stop spam in Google Analytics:
Crawlers: server-side solutions or filters in GA
Ghosts: ONLY filters in GA
The only efficient solution to prevent being hit by ghost spam is by making an include filter with all your valid hostnames.
First you need to make a REGEX with all the valid hostnames, something like this (you can find them on the network report)
yoursite\.com|shoppingcart\.com|translateservice\.net
These are some examples; you might have more or fewer hostnames. Once you have the REGEX, follow the same steps as above and change this:
Go to the admin tab in Google Analytics
Select FILTER under the View Column > New Filter
Filter type Custom > Include > Filter Field Hostname
File Pattern Copy the hostname expression you built
For Crawlers you will have to create a different filter building an expression with all spammers
spammer1|spammer2|spammer3|spammer4|spammer5
Filter type Custom > Exclude > Filter Field Campaign source
File Pattern Copy the referral expression
Everytime you work with filters it is important that you keep an unfiltered view.
If you need detailed steps for this solutions you can check this complete guide about Spam in Google Analytics.
Guide to stop and remove All the spam in Google Analytics
Hope it helps.
Hostname report Example
This blog post suggests that the spam referrers manipulate Google Analytics and never actually visit your site, so blocking them is pointless. Google Analytics offers filtering if you want to mitigate fake site hits.
Yes you can block with .htaccess and actually you should do it.
Your .htaccess file could look like this:
<IfModule mod_setenvif.c>
# Set spammers referral as spambot
SetEnvIfNoCase Referer darodar.com spambot=yes
SetEnvIfNoCase Referer 7makemoneyonline.com spambot=yes
## add as many as you find
Order allow,deny
Allow from all
Deny from env=spambot
</IfModule>
When traffic comes from these sites, they are blocked with this .htaccess, so the HTML is never loaded and therefore GA script is not fired up (from these sites).
They try to collect traffic from you, once you see the incoming traffic in Google Analytics then trying to find out what is the source you go to that URL. It is harmless to your site, except your statistics are full of junk data.
Google Analytics should prevent this, the same way GMail prevents spam email.
According to this entry, they are never visiting your site, they are faking HTTP request to GA using your UA-code. So, it seems it's pointless to block them using .htaccess or any other method, because they never actually enter to your site, they are only sending fake "visit" data to Google.
We have found that using htaccess is a good way to stop these spams. I have implemented below solution on my clients site which is working really well so far.
Best way is to stop them by contains clause, e.g. spam priceg.com check for priceg in referrer url.
Because many of these sites are creating sub domains and re hitting and when they tweak the url, hard coded conditions fail
RewriteCond %{HTTP_REFERER} (priceg) [NC,OR]
RewriteCond %{HTTP_REFERER} (darodar) [NC,OR]
It is explained in detail here
apparently, this is done by a spammer by communicating directly with google analytics using your website's account ID. So they effectively tell google analytics they visited your page while in fact they never did. They identify themselves to analytics by means of an URL which THEY WANT YOU TO VISIT. So you see their traffic in google analytics and go check them out. They will have an amazon affiliate account hooked up and so they attempt to get a commission from your amazon purchases, for example.
so .htaccess did nothing for me when I was fighting this one; you need to create a filter which filters out things like (.*)/.darodar/.com
the real bad effect I have found from this is it invalidates my website statistics
You can restrict access use .htaccess or by filtering ALL robot visits from being tracked by Google Analytics. If that doesn't work, setup Google Analytics filtering. More details on how to do that can be found here: http://www.wiyre.com/google-analytics-darodar-forum-spam-what-is-it/
They are Russian based but routing their spiders through China and the Philippines. Maybe it would be best to block the whole IP address at this point, they have multiple sub-domains.
Blocking any bots at your web server level makes no sense - spammers are sending fake requests to Google Analytics web server. All they have to know is website domain name and Google Analytics ID linked to it.
So you have to mask your Google Analytics ID at website code. For example, you can do like this at Google Analytics JS code:
ga('create', 'UA-X' + 'XXXXX' + 'XX-X', 'auto');
Spammer's bot should be able to execute JS code to parse your Google Analytics ID after this change (and not so many bots will be able to do it).
https://nobodyonsecurity.com/security/fighting-google-analytics-referrer-spam
.htaccess is not the best way. In my site I use GA, The option tracking information and then Reference exclusion list.
Regards!
Lunametrics posted a nice article to solve this issue using Google Tag Manager:
http://www.lunametrics.com/blog/2014/03/11/goodbye-to-exclude-filters-google-analytics/
I think that the most effective way to avoid ghost spam is to add a custom dimension that let you know the site was indeed visited, because as we know they never visit the site.
ga('set', 'dimension1', "Hey I'm really here!!");
ga('send', 'pageview');
You should simply add this lines in your pages and then add a filter to "include" only when the dimension has the expected value ("Hey I'm really here!!") in this case
I used these mod_rewrite methods for semalt:
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?semalt\.com.*$ [NC]
RewriteCond %{HTTP_REFERER} ^http(s)?://(.*\.)?semalt\.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*semalt\.com\ [NC,OR]
or with the .htaccess module mod_setenvif
SetEnvIfNoCase Referer semalt.com spambot=yes
SetEnvIfNoCase REMOTE_ADDR "217\.23\.11\.15" spambot=yes
SetEnvIfNoCase REMOTE_ADDR "217\.23\.7\.144" spambot=yes
Order allow,deny
Allow from all
Deny from env=spambot
I even created an Apache, Nginx & Varnish blacklist plus Google Analytics segment to prevent referrer spam traffic, you can find it here:
https://github.com/Stevie-Ray/referrer-spam-blocker/
Filter future and historical ga spam of all types with the link provided. Hostname filtering is particularly easy.
https://www.ohow.co/ultimate-guide-to-removing-irrelevant-traffic-in-google-analytics/
2019 update
I may have a solution to this problem as I find none of the other solutions to be effective.
Let me address the problems of the existing solutions first
Add a filter for each referrer spam domain.
How many domains will you add?
Most of these referrer spam domains exist for sometime and
then disappear
Maintain a blacklist of referrer spam domains.
This gets even more complicated as they are basically endless in numbers.
You would have to keep updating the blacklist.
Also bigger the blacklist, the more time you need to scan it
Anything else such as maintaining a manual htaccess or something will require manual intervention which will not scale as your site becomes more popular
Anything automatic such as using AI to determine patterns in how referrer spam domains appear will have a hit/miss thing
How do these bots work?
First, it is crucial to understand how these bots work
They use regex patterns at the least such as /UA-\d{6}/ to load tracking ids which they visit recursively after starting at a seed website
I believe I have a solution that offers the following advantages
No need to maintain whitelists and blacklist
Will work against 99% of them easily and can always be modified to take it to 100%
Requires almost NO manual intervention
The idea is to NOT have a tracking ID at all in the script
Here is an example
script.
//- Google Analytics ID
var a = [85, 65, 45, 49, 49, 49, 49, 49, 49, 49, 49, 49, 45, 50];
var newScript = document.createElement("script");
newScript.type = "text/javascript";
newScript.setAttribute("async", "true");
newScript.setAttribute("src", "https://www.googletagmanager.com/gtag/js?id=" + a.map(i => String.fromCharCode(i)).join(""));
document.documentElement.firstChild.appendChild(newScript);
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', a.map(i => String.fromCharCode(i)).join(""), { 'send_page_view': false });
// Feature detects Navigation Timing API support.
if (window.performance) {
// Gets the number of milliseconds since page load
// (and rounds the result since the value must be an integer).
var timeSincePageLoad = Math.round(performance.now());
console.log(timeSincePageLoad)
// Sends the timing event to Google Analytics.
gtag('event', 'timing_complete', {
'name': 'load',
'value': timeSincePageLoad,
'event_category': '#{title}'
});
}
We take a very simple approach, break the tracking ID of the form 'UA-1111111-1' into a char code array
Now we construct the tracking ID dynamically from the char code array at any point we need a reference to the tracking ID
The approach can be made infinitely more complex by turning it into encrypted bunch of numbers, base 8 , hexadecimal, adding a fixed offset, a random offset during each run, RSA encrypting the tracking ID with a private key on the server and decrypting it with a public key but the basic approach is REALLY fast, as arrays in JS are really fast, can easily beat 99% of the bots
I have a SaaS app where every user has a personal subdomain: username.domain.com. Every user has a personal blog at username.domain.com/blog.
Now I want to accept custom domains, e.g. www.mycustomblog.com would be an alias for username.domain.com/blog.
If someone browses to www.mycustomblog.com/123, the page username.domain.com/blog/123 should be served.
However, I do NOT want a redirect. The user should still see www.mycustomblog.com/123in their address bar.
How can I achieve this behaviour? I have looked into Nginx reverse proxies, DNS CNAME records... but nothing seems to suit my needs. I can access both the custom domain DNS settings and all of the server's config files.
I think what you're looking for is a rewrite. However your described logic doesn't work:
www.mycustomblog.com -> username.domain.com/blog
appears to be missing a piece of identifying information on the left side. Perhaps www.mycustomblog.com/username? After that, it's just a matter of writing out the match/map statements to change the request to match what you've got on the server.