Google analytics, privacy and latency - google-analytics

I'm considering putting the Google Analytics tracker on my blog to get better stats (right now I use Summary.net, which is fast but occasionally of questionable accuracy, though way better than awstats). At any rate I have 2 big concerns and though I should ask the community here for opinions:
Privacy. In the past I've gone so far as to block Google's IPs for adsense and tracker. Basically it seems like at this point everything you do online is visible to Google in some way or another, but at the same time I wonder if this really matters. Should I just shrug my shoulders and embed the code?
Latency. I'm not very technically inclined so I don't know much about this, but I heard that if Google is having issues, you site will as well, just by including their code. is that possible?
Thanks!

1) Impossible to answer. If you're running a website for an illegal drug trafficking network, then I probably wouldn't use Analytics. Otherwise, I personally don't care. You could monitor your HTTP traffic and see exactly what the payload is if you're concerned about them tracking anything more than simple IP / browser data. If you're concerned about them knowing "who" visited your site, and have a valid reason, then don't use it.
2) Yep. Have you ever noticed a page taking a while to load and in the status bar it says it's waiting on google-analytics.com? I see it on a somewhat frequent basis on both small and large websites. It's not too big of a problem though IMO.
I believe that it's a recommended best practice to place your analytics script as close to the bottom of your page code as possible (the same would be true for any external scripts).
I'll also add that I didn't really like the idea of Google Analytics much until I started using it. The benefits really seem to outweigh the drawbacks. Its reporting is just fantastic. I still use AwStats on the same sites, however, simply because the data is accessible...and it's nice to compare numbers.

Related

How to track HTML changes from another team that could break analytics rule conditions?

My department has analytic rule conditions in DTM that trigger events based on particular classes or custom data attributes. I'm concerned that if our dev team makes a change that would break the rule, we wouldn't find out until it's discovered that the metric is no longer tracking.
We're trying to future proof our scripts to allow for changing conditions (eg: using regex for changing class names &/or functions to traverse the DOM to find a condition without it needing to be hardcoded), but I thought someone here might have experience with this type of issue. How was it handled at your company?
**EDIT:**I'm exploring using custom Data Elements within DTM that are created with javascript that has multiple conditions for traversing the DOM in ways we've identified. So a sort of data layer that's controllable by my team.
Note: This isn't really an actual coding question; more of [analytics/marketing tag] coding principles/best practices. So I'm not entirely sure this question belongs on SO (maybe one of the other stack exchange sites, perhaps superuser.com?). But I'll answer here anyways.
TL;DR - You need to get site devs involved and have them take on some level of initial and ongoing ownership of it.
Tag managers sell themselves on being able to deploy tags without getting site devs involved, and many times this works out in the short term. But in my experience, this kind of passive deployment just doesn't work out in the long term, especially for websites that have active and regular changes over time.
In my experience, the only way to effectively help prevent site devs from inadvertently breaking the tracking from this, is to include the site devs in the deployment and make them take ownership of it on some level, so that it is something they are aware of within their own system/flow.
Sometimes it is as easy as having designated classes or attributes added to html tags on the page. For example, you can write a spec for the site devs to add a data-analytics='true' to any header, footer, CTA links on a given page, and tell the site devs this is something they need to keep as part of their workflow whenever they make changes to the site.
For more complicated things, you could spec for them to do something like broadcast a custom event for you to listen for. For example, maybe you have a purchase confirmation page and right now you have code in DTM to trigger based off the URL, or scrape the page for details about the purchase to push to tags. Instead, create a spec with instructions for the devs to put that in a data layer object and push to a custom event, and then create an event based rule for it.
The overall theme here is to create a spec document for all the things you want to be able to track on the website, that you know you can't reliably passively track without it breaking sooner or later, and hand the document to the devs and tell them they need to make it a part of their flow when making changes to the site. Bonus points if you can get them to loop you in whenever changes are going to be made and pushed to production, that you may go to your dev/qa version of your site to test to make sure your tracking still looks good.
The overall overall theme here is in order to prevent site devs from breaking your tracking, you need to be more proactive about making them and keeping them aware of your tracking, and in practice, this usually means putting some of the code work on their plate to own, so it's something in their history, in front of their face to see and know about. Because it is a lot easier for a dev to take notice of a data-analytics='true' in the header nav links they are about to restructure, than knowing hey, some piece of code in DTM relies on this current structure.. something that's not more directly in front of their face in their own code editor/environment.
Yes, actually accomplishing the above is often easier said than done. But it is the reality of the situation. Passively tracking things in a tag manager rarely works out for longer term stuff, short of "every page" tags that have little or no customization requirements at all.
I tell you from my own experience of over 10 years of working in the digital marketing and analytics industry, specifically with implementation, I have seen this time and time again. Too many times to count. Clients often want to and actually take the easier route of leaving the site devs out of the loop, all tracking requirements done solely through whatever the tag manager is capable of doing.
I've seen setups with hundreds of rules with trigger conditions based off scraping the page for some id or class, or some complex css selector dependent on 5 levels of html structure not changing. Or some random cookie you just assume means what you think it means. And you're spending more and more of your time playing whack-a-mole trying to re-adjust/fix individual rules/selectors as the next random change happens, and then one day comes a full site redesign and it's a nuclear bomb on all your tracking efforts.
And time and time again, without fail, eventually they wind up asking exactly what you've asked here, kicking themselves in the arse for the time and money they spent already on it for that "quick win", because nobody is confident in the data and they're wondering why they are allocating money/resources on tracking if it's just a bunch of trash, broken, pothole data. And the solution to it has always been "site dev awareness".
If it helps.. one card I sometimes play if I have a hard time convincing the dev team or other powers-that-be to jump on that bandwagon, is to point out that one of the biggest reasons for tracking websites is to help companies determine whether or not it's worth investing money in said website. If they can't determine that, then they may not be so inclined to do so, which means their need to even have a site dev team may also decline. To be more candid: it is something that helps justify their job.

Are there any text-only CAPTCHA systems available?

for some niche reasons I often find that a text-only captcha would be better than the traditional image/audio solutions. Some of the reasons I can think of off the top of my head:
Text-based browser support(lynx, links, etc)
To provide write-capable public APIs, but prevent massive amount of spam
For accessibility reasons (I've heard that the audio versions are very hard to understand to prevent voice-recognition software attacks)
Are there any text-only CAPTCHA systems available out there?
http://textcaptcha.com/ is a logic based, but also purely text based, captcha system., and is free too.
I don't think you have any control over the type of questions that are passed to you, but worth reading the documentation. Whilst it may pose problems for users with cognitive disabilities, it's on the right track, and an improvement on the standard recaptcha type captchas, which create difficulties for blind and vision impaired users alike.
All of the CAPTCHAs I've seen that don't use images or audio involve some sort of cognition:
What day of the week is tomorrow?
Who was the host of last week's SNL?
What is 2 + 3 plus four?

Prevent automated tools from accessing the website

The data on our website can easily be scraped. How can we detect whether a human is viewing the site or a tool?
One way is by calculating time which a user stays on a page. I do not know how to implement that. Can anyone help to detect and prevent automated tools from scraping data from my website?
I used a security image in login section, but even then a human may log in and then use an automated tool. When the recaptcha image appears after a period of time the user may type the security image and again, use an automated tool to continue scraping data.
I developed a tool to scrape another site. So I only want to prevent this from happening to my site!
DON'T do it.
It's the web, you will not be able to stop someone from scraping data if they really want it. I've done it many, many times before and got around every restriction they put in place. In fact having a restriction in place motivates me further to try and get the data.
The more you restrict your system, the worse you'll make user experience for legitimate users. Just a bad idea.
It's the web. You need to assume that anything you put out there can be read by human or machine. Even if you can prevent it today, someone will figure out how to bypass it tomorrow. Captchas have been broken for some time now, and sooner or later, so will the alternatives.
However, here are some ideas for the time being.
And here are a few more.
and for my favorite. One clever site I've run across has a good one. It has a question like "On our "about us" page, what is the street name of our support office?" or something like that. It takes a human to find the "About Us" page (the link doesn't say "about us" it says something similar that a person would figure out, though) And then to find the support office address,(different than main corporate office and several others listed on the page) you have to look through several matches. Current computer technology wouldn't be able to figure it out any more than it can figure out true speech recognition or cognition.
a Google search for "Captcha alternatives" turns up quite a bit.
This cant be done without risking false positives (and annoying users).
How can we detect whether a human is viewing the site or a tool?
You cant. How would you handle tools parsing the page for a human, like screen readers and accessibility tools?
For example one way is by calculating the time up to which a user stays in page from which we can detect whether human intervention is involved. I do not know how to implement that but just thinking about this method. Can anyone help how to detect and prevent automated tools from scraping data from my website?
You wont detect automatic tools, only unusual behavior. And before you can define unusual behavior, you need to find what's usual. People view pages in different order, browser tabs allow them to do parallel tasks, etc.
I should make a note that if there's a will, then there is a way.
That being said, I thought about what you've asked previously and here are some simple things I came up with:
simple naive checks might be user-agent filtering and checking. You can find a list of common crawler user agents here: http://www.useragentstring.com/pages/Crawlerlist/
you can always display your data in flash, though I do not recommend it.
use a captcha
Other than that, I'm not really sure if there's anything else you can do but I would be interested in seeing the answers as well.
EDIT:
Google does something interesting where if you're looking for SSNs, after the 50th page or so, they will captcha. It begs the question to see whether or not you can intelligently time the amount a user spends on your page or if you want to introduce pagination into the equation, the time a user spends on one page.
Using the information that we previously assumed, it is possible to put a time limit before another HTTP request is sent. At that point, it might be beneficial to "randomly" generate a captcha. What I mean by this, is that maybe one HTTP request will go through fine, but the next one will require a captcha. You can switch those up as you please.
The scrappers steal the data from your website by parsing URLs and reading the source code of your page. Following steps can be taken to atleast making scraping a bit difficult if not impossible.
Ajax requests make it difficult to parse the data and require extra efforts in getting the URLs to be parsed.
Use cookie even for the normal pages which do not require any authentication, create cookies once the user visits the home page and then its required for all the inner pages.This makes scraping a bit difficult.
Display the encrypted code on the website and then decrypt it on the loadtime using javascript code. I have seen it on a couple of websites.
I guess the only good solution is to limit the rate that data can be accessed. It may not completely prevent scraping but at least you can limit the speed at which automated scraping tools will work, hopefully below a level that will discourage scraping the data.

Logging a user session for playback

Running an MVC2 site against IIS7 and would like to capture more detail of how users traverse the site - ideally to the point of being able to replay even the duration between mouse clicks - feedback of where people pause and/or backtrack.
I could do this with flash but that's no longer an option. Now it's just IIS7 via asp.net f4. IIS7 _should be able to provide this via 3rd party extensions - especially for this sort of niche need. I'm willing to consider client-side .net components but this sure seems to be the responsibility of the server.
[opps...does this belong on serverfault?]
thx
justSteve. Here is a solution that we have used:
http://www.seevolution.com/
I don't think that it gives time between clicks, but it does give very detailed tracking considering it's price (I don't know if that's an issue). We have really liked it. Fantastic detail.
You could also roll your own solution. Using jQuery and the $(document).click() function, you can log when they click, and the points on the screen. Then every couple of minutes, serialize it and fire it off to the server. You can get extremely fine-grained detail that way. The nice thing with seevolution is that they've done all of the work for you already, but it probably isn't as detailed as you would like.
JMax
Maybe not the "in-house" solution you're after but we are about to implement SessionCam at my company, which seems like a pretty good match for what you're looking for. Not having actually finished implementing it yet, I can't vouch for it in terms of quality at this point - but the description of the product certainly matches.
You aren't going to be able to capture the level of detail you need using a solely server-side solution. There needs to be a degree of client-side work - whether it's in flash or javascript - to capture things such as where the mouse is hovering (for heatmaps etc).
I personally haven't used this product, but a friend of mine spoke highly of it.
Clicktale

What might my user have installed thats going to break my web app?

There are probably thousands of applications out there like 'Google Web Accelerator' and all kinds of popup blockers. Then theres header blocking personal firewalls, full site blockers, and paranoid cookie monsters.
Fortunately Web Accelerator is now defunct (I suggest you read the above article - its actually quite funny what issues it caused) but there are so many other plugins and third party apps out there that its impossible to test them all with your app until its out in the wild.
What I'm looking for is advice on the most important things to remember when writing a web-app (whatever technology) with respect to ensuring the user's environment isnt going to break it. Kind of like a checklist.
Whats the craziest thing you've experienced?
PS. I may have linked to net-nanny above, but I'm not trying to make a porn site
The best advice I can give is to program defensively. For example, don't assume that all of your scripts may be loaded. I've seen cases where AdBlocker Plus will block 1/10 scripts that are included in a page just because it has the word "ad" in the name or path. While you can work around this by renaming the file, it's still good to check that a particular object exists before using it.
The weirdest thing I've seen wasn't so much a browser plugin but a firewall/proxy configuration at a user's workplace. They were using a squid proxy that was trying to remove ads by replacing any image HTTP request that it thought was an ad with a single pixel GIF image. Unfortunately it did this for non-GIF images too so when our iPhone application was expecting a PNG image and got a GIF, it would crash.
Internet Explorer 6. :)
No, but seriously. Firefox plugins like noscript and greasemonkey for one, though those are likely to be a very small minority.
Sometimes the user's environment means a screen reader (or even a braille interface like this). If your layout is in any way critical to the content being delivered as intended, you've got a problem right there.
Web pages break, fact of life; the closer you have been coding and designing up against standards, the less your fault it is.
Something I have checked in the past is loading some of the more popular toolbars that people tend to install (Google, Yahoo, MSN, etc) and seeing how that affects the users experience.
To a certain extent it is difficult to preempt which of the products you mentioned will be used by your users since there are so many. I would say your best bet is to test for the most frequent products that your user base may employ and roll with the punches for the rest. If you have the time to test other possible scenarios, by all means do.
Also, making it easy for your users to report possible issues also helps lessen the time it takes to get a fix in place should it be something you can work around.

Resources