We are just about to release a big update to our website. It is a complete rebuild. Our domain is staying the same and the website generally functions the same way (the business hasn't changed and we are still selling the same stuff). But every page has changed and most page URLs have also changed.
My question is, how should we deal with Google Analytics in this situation? Should we stick with the original GA account and simply start feeding information from the new website instead? Or should we make a new account, or just a new property, or view?
I think it makes sense to stick with the original account and property, so we can easily derive meaningful stats about the effect of the website upgrade on performance. However, I'm worried that having a completely different URL structure mixed in with the older structure will make things difficult to dig through.
Am I right that sticking with the original account and property is sensible in this situation, and does anyone have any other general pointers?
Thanks.
Related
My department has analytic rule conditions in DTM that trigger events based on particular classes or custom data attributes. I'm concerned that if our dev team makes a change that would break the rule, we wouldn't find out until it's discovered that the metric is no longer tracking.
We're trying to future proof our scripts to allow for changing conditions (eg: using regex for changing class names &/or functions to traverse the DOM to find a condition without it needing to be hardcoded), but I thought someone here might have experience with this type of issue. How was it handled at your company?
**EDIT:**I'm exploring using custom Data Elements within DTM that are created with javascript that has multiple conditions for traversing the DOM in ways we've identified. So a sort of data layer that's controllable by my team.
Note: This isn't really an actual coding question; more of [analytics/marketing tag] coding principles/best practices. So I'm not entirely sure this question belongs on SO (maybe one of the other stack exchange sites, perhaps superuser.com?). But I'll answer here anyways.
TL;DR - You need to get site devs involved and have them take on some level of initial and ongoing ownership of it.
Tag managers sell themselves on being able to deploy tags without getting site devs involved, and many times this works out in the short term. But in my experience, this kind of passive deployment just doesn't work out in the long term, especially for websites that have active and regular changes over time.
In my experience, the only way to effectively help prevent site devs from inadvertently breaking the tracking from this, is to include the site devs in the deployment and make them take ownership of it on some level, so that it is something they are aware of within their own system/flow.
Sometimes it is as easy as having designated classes or attributes added to html tags on the page. For example, you can write a spec for the site devs to add a data-analytics='true' to any header, footer, CTA links on a given page, and tell the site devs this is something they need to keep as part of their workflow whenever they make changes to the site.
For more complicated things, you could spec for them to do something like broadcast a custom event for you to listen for. For example, maybe you have a purchase confirmation page and right now you have code in DTM to trigger based off the URL, or scrape the page for details about the purchase to push to tags. Instead, create a spec with instructions for the devs to put that in a data layer object and push to a custom event, and then create an event based rule for it.
The overall theme here is to create a spec document for all the things you want to be able to track on the website, that you know you can't reliably passively track without it breaking sooner or later, and hand the document to the devs and tell them they need to make it a part of their flow when making changes to the site. Bonus points if you can get them to loop you in whenever changes are going to be made and pushed to production, that you may go to your dev/qa version of your site to test to make sure your tracking still looks good.
The overall overall theme here is in order to prevent site devs from breaking your tracking, you need to be more proactive about making them and keeping them aware of your tracking, and in practice, this usually means putting some of the code work on their plate to own, so it's something in their history, in front of their face to see and know about. Because it is a lot easier for a dev to take notice of a data-analytics='true' in the header nav links they are about to restructure, than knowing hey, some piece of code in DTM relies on this current structure.. something that's not more directly in front of their face in their own code editor/environment.
Yes, actually accomplishing the above is often easier said than done. But it is the reality of the situation. Passively tracking things in a tag manager rarely works out for longer term stuff, short of "every page" tags that have little or no customization requirements at all.
I tell you from my own experience of over 10 years of working in the digital marketing and analytics industry, specifically with implementation, I have seen this time and time again. Too many times to count. Clients often want to and actually take the easier route of leaving the site devs out of the loop, all tracking requirements done solely through whatever the tag manager is capable of doing.
I've seen setups with hundreds of rules with trigger conditions based off scraping the page for some id or class, or some complex css selector dependent on 5 levels of html structure not changing. Or some random cookie you just assume means what you think it means. And you're spending more and more of your time playing whack-a-mole trying to re-adjust/fix individual rules/selectors as the next random change happens, and then one day comes a full site redesign and it's a nuclear bomb on all your tracking efforts.
And time and time again, without fail, eventually they wind up asking exactly what you've asked here, kicking themselves in the arse for the time and money they spent already on it for that "quick win", because nobody is confident in the data and they're wondering why they are allocating money/resources on tracking if it's just a bunch of trash, broken, pothole data. And the solution to it has always been "site dev awareness".
If it helps.. one card I sometimes play if I have a hard time convincing the dev team or other powers-that-be to jump on that bandwagon, is to point out that one of the biggest reasons for tracking websites is to help companies determine whether or not it's worth investing money in said website. If they can't determine that, then they may not be so inclined to do so, which means their need to even have a site dev team may also decline. To be more candid: it is something that helps justify their job.
I am working on a project where I do not have access to page source nor can I ask client's IT team to create in-page/onload dataLayer e.g. having dataLayer before the tag. Is the idea of implementing enhanced e-commerce tracking remotely possible via dataLayer.push e.g. build dataLayer as you go?I have a very little knowledge of dataLayer.push (which I am starting to read up). Question is :
Is dataLayer.push the correct way move forward? Have anyone done this
before?
what issue I might face e.g. "add to cart" event, remove cart or
category page any working example? I still don't have a clear view of
the workflow here.
What are the downside of doing it this way beside "style/css" based
trigger might fall off during future site design update.
Thanks.
Your main problem is that you need to get the data from somewhere. Usually without a dataLayer this means DOM scraping and then assembling the scraped data into a dataLayer in a custom javascript function. Drawbacks are the same as with your stle/css based triggers:
Implementation is strongly coupled to the page layout
data might be missing, or require cleaning (if mixed with html or unrelated text)
potentially expensive in clients CPU time
custom javascript introduces new points of failure (i.e. can you test rigorously enough to guarantee that there are no side effects)
If you do workarounds around your client's ITs shortcomings remember that you own them - you will be responsible if the workarounds break, or have side effects, or need to be amended to account for new features. Make sure that you are very well compensated for that risk (and personally with the experience I have now I would not do this at all).
I think it's important to acknowledge here that the Data Layer is just one way of storing this information, and is used solely because it's possible to push data to it from the page.
If you can't write the code into the site itself, don't push information to the data layer. Just keep that information for yourself, in GTM's variables. You'll save yourself a huge headache, and a bit of computation too.
DOM scraping is a perfectly reasonable way to get hold of information, but you will run into some barriers.
You're going to have to write a lot of JavaScript to get the data you need.
Some buttons may turn out to be composed of several elements that you'll have to cover with your triggers etc.
Any changes to the site will potentially ruin your code.
Not everything is available to you, especially verification of data (checking if a purchase went through is probably no longer possible before reporting the transaction).
The data on our website can easily be scraped. How can we detect whether a human is viewing the site or a tool?
One way is by calculating time which a user stays on a page. I do not know how to implement that. Can anyone help to detect and prevent automated tools from scraping data from my website?
I used a security image in login section, but even then a human may log in and then use an automated tool. When the recaptcha image appears after a period of time the user may type the security image and again, use an automated tool to continue scraping data.
I developed a tool to scrape another site. So I only want to prevent this from happening to my site!
DON'T do it.
It's the web, you will not be able to stop someone from scraping data if they really want it. I've done it many, many times before and got around every restriction they put in place. In fact having a restriction in place motivates me further to try and get the data.
The more you restrict your system, the worse you'll make user experience for legitimate users. Just a bad idea.
It's the web. You need to assume that anything you put out there can be read by human or machine. Even if you can prevent it today, someone will figure out how to bypass it tomorrow. Captchas have been broken for some time now, and sooner or later, so will the alternatives.
However, here are some ideas for the time being.
And here are a few more.
and for my favorite. One clever site I've run across has a good one. It has a question like "On our "about us" page, what is the street name of our support office?" or something like that. It takes a human to find the "About Us" page (the link doesn't say "about us" it says something similar that a person would figure out, though) And then to find the support office address,(different than main corporate office and several others listed on the page) you have to look through several matches. Current computer technology wouldn't be able to figure it out any more than it can figure out true speech recognition or cognition.
a Google search for "Captcha alternatives" turns up quite a bit.
This cant be done without risking false positives (and annoying users).
How can we detect whether a human is viewing the site or a tool?
You cant. How would you handle tools parsing the page for a human, like screen readers and accessibility tools?
For example one way is by calculating the time up to which a user stays in page from which we can detect whether human intervention is involved. I do not know how to implement that but just thinking about this method. Can anyone help how to detect and prevent automated tools from scraping data from my website?
You wont detect automatic tools, only unusual behavior. And before you can define unusual behavior, you need to find what's usual. People view pages in different order, browser tabs allow them to do parallel tasks, etc.
I should make a note that if there's a will, then there is a way.
That being said, I thought about what you've asked previously and here are some simple things I came up with:
simple naive checks might be user-agent filtering and checking. You can find a list of common crawler user agents here: http://www.useragentstring.com/pages/Crawlerlist/
you can always display your data in flash, though I do not recommend it.
use a captcha
Other than that, I'm not really sure if there's anything else you can do but I would be interested in seeing the answers as well.
EDIT:
Google does something interesting where if you're looking for SSNs, after the 50th page or so, they will captcha. It begs the question to see whether or not you can intelligently time the amount a user spends on your page or if you want to introduce pagination into the equation, the time a user spends on one page.
Using the information that we previously assumed, it is possible to put a time limit before another HTTP request is sent. At that point, it might be beneficial to "randomly" generate a captcha. What I mean by this, is that maybe one HTTP request will go through fine, but the next one will require a captcha. You can switch those up as you please.
The scrappers steal the data from your website by parsing URLs and reading the source code of your page. Following steps can be taken to atleast making scraping a bit difficult if not impossible.
Ajax requests make it difficult to parse the data and require extra efforts in getting the URLs to be parsed.
Use cookie even for the normal pages which do not require any authentication, create cookies once the user visits the home page and then its required for all the inner pages.This makes scraping a bit difficult.
Display the encrypted code on the website and then decrypt it on the loadtime using javascript code. I have seen it on a couple of websites.
I guess the only good solution is to limit the rate that data can be accessed. It may not completely prevent scraping but at least you can limit the speed at which automated scraping tools will work, hopefully below a level that will discourage scraping the data.
I have started developing a webpage and recently hired someone to write code to display a customized feed (powered by API) in the middle panel on http://farmball.com/. Note that this is not the RSS feed tied to the site blog. The feed ties to my account on another site. There is no RSS link for an average user to subscribe to the feed. I've taken the site out of maintenance mode to ask anyone here with scraping/hacking experience how someone would most easily go about 'taking' the feed and displaying it on their own site. More importantly, what can I do to prevent it?
^Updated for re-wording
You can't.
If you are going to expose an RSS feed which you don't want others to be able to display on their site then you are completely missing the point of RSS. The entire reason for Really Simple Syndication (RSS) is to make your content externally consumable- whether that's in an RSS Reader or through someone simply printing its content on their own website.
Why are you including an RSS feed if you do not want someone to be able to consume it?
what can I do to prevent...'taking' the feed and displaying it on their own site?
Nothing. Preventing reuse goes against the basic concept of RSS, which is to make it as easy as possible for anyone to do anything they want with it. It was designed from the ground up to be Really Simple to Syndicate, not Really Hard to Retransmit Without Permission.
You could restrict access to the feed itself to trusted users only by making them provide some credentials or pass in a key to the feed (e.g. yoursite.rss?mykey=abc123). But you cannot control use. Only access.
Be explicit about your license. It isn't a technology solution, as others have mentioned, the technology is an open technology-- this isn't DRM! But if you ask in each post that people who use this feed to not repost/fail to give credit/etc then some people will respond to the request.
Otherwise, you're better off putting your content behind a password and using a paid subscription model for distributing your content.
This is a DRM problem essentially. If you had some technique that you could put content on the web without having it redistributable, the music industry would love you.
It is possible to try to prevent redistribution. One technique you could try is embedding a signature of some sort into the feed for each user who you require to sign up. If the content is found on the web, you can identify and ban the user who redistributed your content.
This is avoidable too, by getting multiple accounts and normalizing the content to remove fingerprints. For the would-be pirate, this requires more effort than they may be willing to put in. Your signature could be a unique whitespace pattern, tiny variances in the timestamps on posts, misplaced pixels in videos, or any other thing you can vary slightly without end users noticing.
use .htpassword
better yet, don't put something private in a public place where it's likely to get picked up by software automatically. Like others have said, it's a pretty odd question, if you're trying to figure something else out, you're better off being explicit with what you want to know.
Has anyone ever tried to integrate AspDotNetStorefront and Sitecore? I've been trying for the past couple of days to come up with a way to get the two systems to play nicely together, but it doesn't seem feasible from what I can tell. A couple issues I've run across so far:
Authentication between the two (AspDotNetStorefront has its own implementation, Sitecore just uses/extends .NET Membership)
The main DLL for AspDotNetStorefront is what pops up in the stack trace when I get yellow-screened, but that DLL is obfuscated so I can't figure out what the problem is.
The biggest issue is that we need to keep our existing AspDotNetStorefront application as an e-commerce backend and use Sitecore to do everything else. AspDotNetStorefront has a CMS as part of it, but it's really not an acceptable solution for anything but really basic content pages.
Any thoughts on how I might go about this?
EDIT:
I've decided to break this whole thing down into the different problems that I am facing at the moment and solve each one as efficiently as I know how. I'll detail the ones I have here and then update when I run into new ones.
Problem 1: Authentication between the two systems.
This one isn't too bad actually if you're knowledgeable about forms authentication tickets, which I wasn't at the time but am learning quickly enough. As long as the two systems share the same encryption info, it's easy enough to pass information back and forth between them using cookies as stated below in the accepted answer. The other kicker is that I needed to set the CustomerGUID in the AspDotNetStorefront Customer table to be the user ID from the Sitecore user tables (standard ASP.NET membership). So far this approach seems to work pretty well (I'm still in the proof of concept stage at the moment.
Another thing to keep in mind if you ever need to attempt this is that AspDotNetStorefront comes with a web service that you can use to basically do anything you need. Since they use the same encryption keys, I am able to log in on the storefront side using this service more securely than just passing over clear text passwords (I had to write the method myself, I don't believe it comes standard, if I am mistaken please let me know). Although I doubt it's a huge deal since it all happens server side anyways.
Problem 2: Getting at the product data
This one was a little more troublesome. The aforementioned web service has a few issues I've had difficulty working around. However, since the databases are going to be on the same server, I simply decided that since all I really need is the price and ID I would go ahead and set the ProductGUID column of each product in the Storefront database to match the Sitecore item ID of the corresponding item in the Sitecore database. This way I just need a quick query to grab the ProductID and price information which is only used in a few places. Everything else is going to be housed in Sitecore.
If anyone has anything to add feel free, as far as I can tell from Google, no one has actually done this before, so I'm having a lot of trouble finding resources on this particular topic.
UPDATE:
The integration is in fact possible and our site has been up for a week and a half now with very few integration related problems. This isn't something I recommend doing really on a personal level, but it is in fact possible to pull off.
I know ASPDotNetStorefront and other CMS systems (but not Sitecore). If I was approaching this, I would probably start simple and create a custom URL structure for sitecore 'content' pages that ASPDNSF would direct to Sitecore to handle. [possibly replacing the existing topics system in ASPDNSF]. So, for example: a URL such as www.domain.com/p-1234-aproductpage.aspx would be handled by ASPDNSF whereas www.domain.com/content/123/a-content-page would get sent to Sitecore to render. This is a straightforward web.config edit.
Security sharing across the systems should be possible across the same domain as the cookie information will be available (you should be able to create some code in Sitecore using the ASPDNSFCommon.dll and a cast of HttpContext.Current.User into a AspDotNetStorefrontPrincipal class to detect if a customer is logged in)
Another way to approach the problem would be to write a function that retrieved Sitecore content from the database based on a URL id and then write an ASPDNSF XML template to use the function to retrieve this content based on the URL. For example, you could create a custom URL structure in ASPDNSF such as www.domain.com/sc-1234-sitecore-content-item.aspx which is sent to your custom code; 1234 is used as the sitecore content id and the XML template retrieves the content and renders it on screen.
This second approach has the advantage of using Sitecore for all non-product content management while keeping the live application in ASPDNSF. Also one set of design templates and all your security issues go.