How to Sample Adobe Analytics (Omniture) Data

How to Sample Adobe Analytics (Omniture) Data - adobe

I can't find anything on the web about how to sample Adobe Analytics data? I need to integrate Adobe Analytics into a new website with a ton of traffic so the stakeholders want to sample the data to avoid exorbitant server calls. I'm using DTM but not sure if that will help or be a non-factor? Can anyone either point me to some documentation or give me some direction on how to do this?

Adobe Analytics does not have any built-in method for sampling data, neither on their end nor in the js code.
DTM doesn't offer anything like this either. It doesn't have any (exposed) mechanisms in place to evaluate all requests made to a given property (container); any rules that extend state beyond "hit" scope are cookie based.
Adobe Target does offer ability to output code based on % of traffic so you can achieve sampling this way, but really, you're just trading one server call cost for another.
Basically, your only solution would be to create your own server-side framework for conditionally outputting the Adobe Analytics (or DTM) tag, to achieve sampling with Adobe Analytics.
Update:
#MichaelJohns comment below:
We have a file that we use as a boot strap file to serve the DTM file.
What I think we are going to do is use some JS logic and cookies
around that to determine if a visitor should be served the DTM code.
Okay, well maybe i'm misunderstanding what your goal here is (but I don't think I am) but that's not going to work.
For example, if you only want to output tracking for 50% of visitors, how would you use javascript and cookies alone to achieve this? In order to know that you are only filtering 50%, you need to know the total # of people in play. By itself, javascript and cookies only know about ONE browser, ONE person. It has no way of knowing anything about those other 99 people unless you have some sort of shared state between all of them, like keeping track of a count in a database server-side.
The best you can do solely with javascript and cookies is that you can basically flip a coin. In this example of 50%, basically you'd pick a random # between 1 and 100 and lower half gets no tracking, higher half gets tracking.
The problem with this is that it is possible for the pendulum to swing 100% one way or the other. It is the same principle as flipping a coin 100 times in a row: it is entirely possible that it can land on tails all 100 times.
In theory, the trend over time should show an overall average of 50/50, but this has a major flaw in that you may go one month with a ton of traffic, another month with few. Or you could have a week with very little traffic followed by 1 day of a lot of traffic. And you really have no idea how that's going to manifest over time; you can't really know which way your pendulum is swinging unless you ARE actually recording 100% of the traffic to begin with. The affect of all this is that it will absolutely destroy your trended data, which is the core principle of making any kind of meaningful analysis.
So basically, if you really want to reliably output tracking to a % of traffic, you will need a mechanism in place that does in fact record 100% of traffic. If I were going to roll my own homebrewed "sampler", I would do this:
In either a flatfile or a database table I would have two columns, one representing "yes", one representing "no". And each time a request is made, I look for the cookie. If the cookie does NOT exist, I count this as a new visitor. Since it is a new visitor, I will increment one of those columns by 1.
Which one? It depends on what percent of traffic I am wanting to (not) track. In this example, we're doing a very simple 50/50 split, so really, all I need to do is increment whichever one is lower, and in the case that they are currently both equal, I can pick one at random. If you want to do a more uneven split, e.g. 30% tracked, 70% not tracked, then the formula becomes a bit more complex. But that's a different topic for discussion ( also, there are a lot of papers and documents and wikis out there published by people a lot smarter than me that can explain it a lot better than me! ).
Then, if it is fated that that I incremented the "yes" column, I set the "track" cookie to "yes". Otherwise I set the "track" cookie to "no".
Then in in my controller (or bootstrap, router, whatever all requests go through), I would look for the cookie called "track" and see if it has a value of "yes" or "no". If "yes" then I output the tracking script. If "no" then I do not.
So in summary, process would be:
Request is made
Look for cookie.
If cookie is not set, update database/flatfile incrementing either yes or no.
Set cookie with yes or no.
If cookie is set to yes, output tracking
If cookie is set to no, don't output tracking
Note: Depending on language/technology of your server, cookie won't actually be set until next request, so you may need to throw in logic to look for a returned value from db/flatfile update, then fallback to looking for cookie value in last 2 steps.
Another (more general) note: In general, you should beware sampling. It is true that some tracking tools (most notably Google Analytics) samples data. But the thing is, it initially records all of the data, and then uses complex algorithms to sample from there, including excluding/exempting certain key metrics from being sampled (like purchases, goals, etc.).
Just think about that for a minute. Even if you take the time to setup a proper "sampler" as described above, you are basically throwing out the window data proving people are doing key things on your site - the important things that help you decide where to go as far as giving visitors a better experience on your site, etc..so now the only way around it is to start recording everything internally and factoring those things in to whether or not to send the data to AA.
But all that aside.. Look, I will agree that hits are something to be concerned about on some level. I've worked with very, very large clients with effectively unlimited budgets, and even they worry about hit costs racking up.
But the bottom line is you are paying for an enterprise level tool. If you are concerned about the cost from Adobe Analytics as far as your site traffic.. maybe you should consider moving away from Adobe Analytics, and towards a different tool like GA, or some other tool that doesn't charge by the hit. Adobe Analytics is an enterprise level tool that offers a lot more than most other tools, and it is priced accordingly. No offense, but IMO that's like leasing a Mercedes and then cheaping out on the quality of gasoline you use.

Related

Google Analytics 4 vs. UA: Why are Event Counts Different?

My event counts between GA4 and UA are different. They are not drastically different (maybe about 10%) but the numbers are still off. If the tags and triggers are all the same in GTM, shouldn't the event counts be identical? what would cause it to be 10% off or is this normal??

First of all, it has at this point become agreed upon that GA4's pre-baked reports are less than reliable. We now suggest avoiding using pre-cooked reports in GA4 and instead either use the Explorer, or export the data and use something else entirely (preferably) if you have the resources for it.
Secondly, make sure you don't have Google Signals enabled since this changes thresholding/sampling logic. Also, switch to device-only reporting, it will help with thresholding. More on it here. It's important that the Signals are never enabled. It looks like the thresholding logic won't be fixed even if you disable them after enabling. Some report the thresholding reduce after you give it some months. Some claim it's due to Signals affecting the data and then the data can't be restored.
Event Properties Cardinality. Here's more about it. Despite what Google claims, GA4 is still full of bugs and unpleasant features. One of it would be the cardinality of your eps' values. Keep it low. Otherwise, sampling kicks in hard and you end up seeing a large percentage of your events as Other(Other). Even when you don't use the high cardinality dimension in your report.
Data retention. See the limitations on it here. Yep, no more free access to old data for your precious ad-hoc YoY analysis, so if you're counting old events, no luck. UA will show them to you, but GA4 wipes them. GA4 tries to still maintain the pre-cooked aggregated reports, but now you can't drill into them as you used to in UA, and they're not accurate anyway.
These are generic suggestions. More debugging would have to be done on your side to find out exactly what datapoints at what times aren't being counted. Data exports to BQ would help narrowing it down. But at this moment, the general consensus among analysts is that we shouldn't compare GA4's data to that of UA. I personally don't agree with that consensus, since it's always good to know the difference, but that has become almost an industry standard today.

I posted a finding to this thread.
had the same issue with purchase events being different for UA vs. GA4.
Universal Analytics was always showing higher numbers and the triggers were exactly the same.
Then I enabled data export to BigQuery and it turned out that GA4 shows only those transactions in the GA4 UI that have a value for the field user_pseudo_id (you only see this field in the BigQuery data export). There were transactions where the field was null and apparently these dont show up in the UI.
I would recommend looking at raw event in BigQuery, the data export is for free as long as you dont go crazy with ETLs and queries.

Could "filling up" Google Analytics with millions of events slow down query performance / increase sampling?

Considering doing some relatively large scale event tracking on my website.
I estimate this would create up to 6 million new events per month in Google Analytics.
My questions are, would all of this extra data that I'm now hanging onto:
a) Slow down GA UI performance
and
b) Increase the amount of data sampling
Notes:
I have noticed that GA seems to be taking longer to retrieve results for longer timelines for my website lately, but I don't know if it has to do with the increased amount of event tracking I've been doing lately or not – it may be that GA is fighting for resources as it matures and as more and more people collect more and more data...
Finally, one might guess that adding events may only slow down reporting on events, but this isn't necessarily so is it?

Drewdavid,
The amount of data being loaded will influence the speed of GA performance, but nothing really dramatic I would say. I am running a website/app with 15+ million events per month and even though all the reporting is automated via API, every now and then we need to find something specific and use the regular GA UI.
More than speed I would be worried about sampling. That's the reason we automated the reporting in the first place as there are some ways how you can eliminate it (with some limitations. See this post for instance that describes using Analytics Canvas, one my of favorite tools (am not affiliated in any way :-).
Also, let me ask what would be the purpose of your events? Think twice if you would actually use them later on...

Slow down GA UI performance
Standard Reports are precompiled and will display as usual. Reports that are generated ad hoc (because you apply filters, segments etc.) will take a little longer, but not so much that it hurts.
Increase the amount of data sampling
If by "sampling" you mean throwing away raw data, Google does not do that (I actually have that in writing from a Google representative). However the reports might not be able to resolve all data points (e.g. you get Top 10 Keywords and everything else is lumped under "other").
However those events will count towards you data limit which is ten million interaction hits (pageviews, events, transactions, any single product in a transaction, user timings and possibly others). Google will not drop data or close your account without warning (again, I have that in writing from a Google Sales Manager) but they reserve to right to either force you to collect less interaction hits or to close your account some time after they issued a warning (actually they will ask you to upgrade to Premium first, but chances are you don't want to spend that much money).
Google is pretty lenient when it comes to violations of the data limit but other peoples leniency is not a good basis for a reliable service, so you want to make sure that you stay withing the limits.

Using events as page section usage

I'm currently researching a solution to monitor the performance of specific sections of a page. For example, you have a simple page with 2 images with links to other pages. You are driving lots of traffic to this page and you are experimenting with different contents on that page.
6 months after, you want to see which section of the page performed better with what kind of specific imges.
Let's imagine you require a report that should tell you the following: on average, the first spot performs better, but last week the image was bad and that's why you had less conversion from that spot.
I'd like to use such a system on a high-traffic homepage of an eCommerce website, in order to better monitor the usage of the selling spots.
I was thinking to use Google Analytics events with a positioning scheme (splitting the website in columns and rows, giving to each cell an identification ID such as a1 for column a, row 1) and keeping a local datawarehouse of creatives (images, promotions etc.), but apparently, after 10.000.000 hits per month, Analytics is recommending the premium version which is quite pricey (12k USD per month, 1 year upfront payment).
I was thinking about PIWIK as an alternative, but there is no event tracking there - or am I missing anything?
Looking forward to hearing your input on this matter.

You're better off with a provider like Optimizely for this use case. Still gonna be expensive, but it'll more quickly get you the information you need to make decisions.

We normally use multi variation tests or A/B tests to measure the success of user interfaces. Google Analytics have this feature and it is free.
This links maybe useful
https://www.youtube.com/watch?v=yDWTMOC_Dp4
https://support.google.com/analytics/answer/1745147?hl=en

When is Google Analytics not good enough? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm trying to determine why an enterprise wouldn't want to use Google Analytics.
Here are the main reasons I've seen mentioned:
Inability to track clients that have Javascript disabled.
Lack of ownership of the statistics - Google owns the data.
Most of the web clients with Javascript disabled will probably be bots/spiders. This data is interesting, but probably not very useful.
As for the ownership issue, this is a bit paranoid IMO.
What am I missing here? When is Google Analytics not good enough?

Here are my findings from additional research:
Google Analytics is limited to 5 million page views per month - source
If a web site generates more than 5 million pageviews per month it will need linked to an active AdWords account to avoid interruption of service.
Lack of / slow technical support
All Google support is handled through email and response times can take a week or more. Commercial analytics products often have much faster & personalized support.
Inability to track files (PDF's, Images, etc.)
GA relies on Javascript and files lack the ability to execute Javascript. The workaround to this problem is to tag the link, but this won't track requests that go directly to the file.
Limited ability to customize
This is a selling point that I see pushed by commercial analytics tools (WebTrends). However it's never explained what customizations are denied by GA but allowed by WebTrends.

The Google Analytics EULA does not allow you to track individual users by identifying them. So if you wanted to add a custom variable for username to track how many times each user logs in, then you would be in a gray zone if not outright violating the EULA.
I use Google Analytics on about 10 sites right now and it's a great tool. In addition to all the analytics stats, you can tie it in with AdSense and it becomes a marketing/revenue tool and not just "wow look at all these cool user stats". If there was a way to track by user ID in certain circumstances (e.g. if user's agreed to it, or if they work for the company that owns the site) then I would have no issues.
Besides, it's free and all you have to do is add JavaScript to the files, so give it a try and see what you think after a few months.

One reason that was, surprisingly, not posted:
timing / speed of reaction
It takes at least 4 hours (up to 24) for GA to update your data.
This is ok for me personally in most of the cases, but when reacting fast is crucial (news sites, one-off events, etc.) you may want to employ some other solution (Mint comes to mind, but it's not the only one out there of course).

Thought I'd add my two pence worth to this thread, as this a topic close to my heart and one I've debated with colleagues for years. We've used webtrends in house for as long as i can remember, back to version 4 of the log analyzer (how different things were back then!). Since Google Analytics came along, we've started to come under increasing pressure from certain parts of our business to switch, as 'it does everything we need form an analytics tool'
Well, true in many senses it does, especially these days. But I championed the integration of our CRM and web analytics tools back in 2006, and as our business isn't e-commerce (the 'conversion' happens offline, sometimes months after the visitor acquisition) we need to integrate in this way to get a true picture of campaign effectiveness, and notion of ROI.
All of this means, we need access to the raw data, need to be able to join visitor records on sessionID etc, without this access we'd be screwed. I'd love it if we could roll without it, but the current requirements mean we can't, so this alone is a HUGE reason why Google analytics is not good enough.
Over and out

For tracking desktop software or creating a whitelabel solution there are better solutions.
For white label an integration based analytics, i use MixPanel. For Desktop Software, i use Deskmetrics

Google Analytics does not work well with mobile phones. While the iPhone and the Palm may be supported, many of the existing handsets do not support the javascript that Google uses.

If you're based in the UK, then theoretically you could be breaking the Data Protection Act by using Analytics.
If information about your users (like which web pages they're looking at) goes "outside the European Economic Area" and onto Google's servers in the US, then you're breaking the DPA.
Pretty obscure, but you did ask :)
Piwik avoids the problem because you host it on your own servers.

Lack of ownership of the statistics - Google owns the data.
... As for the ownership issue, this is a
bit paranoid IMO.
One problem with it is that we can't even access the raw data. We had a use case this week where we wanted a visitor map for an executive presentation. We needed to get more flexible with how the visitor map is displayed (wanted to view the map in Google Earth plug-in). In GA, you can't. You take what they give you. You can see a map of how many visits came from each city, but you can't export a data file of cities and number of visits, to run the data through other tools. So, paranoia aside, there are significant limitations on what you can accomplish with GA.
However this is not a problem if you use Urchin, the self-hosted version of GA: you can export the data and do what you want with it. (And the exported data is richer than the web server log's, as it includes some analysis already.)
Since Piwik is open source, and pluggable, I imagine you could enhance the visitor map plug-in any way you wanted to. And export whatever data you want.
Whether this limitation affects you depends on your needs, obviously.
Update: I've now looked at the GA Data Export API, and it turns out that things you cannot do through the UI (as you can with Urchin), you can do with this API. It does look like you can export the visit data I was talking about, via a feed (although there are daily traffic caps on those requests). So sprinkle salt heavily on what I wrote above.

A couple more points that I've come across:
GA doesn't let you dig beyond full-day statistics; I would often like the ability to investigate whether a traffic dip the previous day was caused by the design update I did at 1pm or the soccer match on TV at 8pm.
GA doesn't offer a workaround for traffic spikes caused by DDoS attacks, Slashdotting etc. When I'm looking at a GA visitor graph of 2009, all I can see is the 2-million-pageview-spike on October 16th, pushing the entire rest of the year down flat against the horizontal axis of the graph. To get a meaningful graph, GA should offer the ability to trim or exclude outlying data points, or the ability to limit/bracket the graph window itself
GA doesn't have an event monitoring client (think Reinvigorate's Snoop tool)

While GA is very user-friendly, I've found it's not as granular as some of the other stats programs (or maybe I'm not looking in the right places). Before the marketing monkeys I work with began pushing GA, we were very satisfied with AWStats. The sheer scope of the data helped us on several occasions hone sites to better suit their audience. While GA is very shiny and laid out well, I personally still prefer the raw numbers like I used to get through AWStats.

Slow data processing speed - Can be as low as 15-30 mins for page views, but may be up to 48 for eCommerce
EULA is limiting in some cases
You won't own or have any control of the data. Google's engineers might use it (anonymously) for testing
Anything more complex requires customization - Downloads and such care of no issue, but there are limits
Cross domain tracking by linker is faulty at best
Visit based - Proper tools are based on Visitor level, GA works on Visit based reporting mostly
Limited number of custom vars used at one time (5)
No tech support, if you're realistic
Usually when there is a downtime notice, it's already gone
API limitations (4 dimensions and 10 metrics at one time, not all can be used together in addition to that)
I have many more, but at the end of the day it is a good tool for it's price.

From the non-technical point, I think the most important is that some enterprise has the high level data security policy. All of the data should be controlled and managed by themselves.
If you use the Google analytics,the data is stored in google's server. For some special enterprise, like insurance, financial company. The policy should be followed.

I would NOT go with server logs. In fact I have them disabled on my server. Why you ask me?
For the simple reason that everytime you hit my server that stupid logging program makes an entry in the physical log file on my HDD. So if my server gets 100,000 hits in a day that's 100,000 time a HDD write operation happens.
You think that's cool? Well it's not. It's slowing your server down, specially if the log file is huge.
Why would someone even consider doing that to their server? Specially when we're working so hard to minify javascript, css and make image files 2 KB smaller!
Please do yourself a favor don't log directly on your server.
At least Google Analytics logs it on Google's server so my server's healthier.

I wouldn't use it for any of my sites, because you're forcing the user to accept your proprietary JavaScript code in their browser, which is bad. Also, giving your data is Google is a really bad idea.
See Piwiki for something you can run yourself as in free software, eliminating both of the problems.

How to create a reliable and robust page view counter in a web application?

I want to count the visits on a web page, and this page represents an element of my model, just like the Stack Overflow question page views.
How to do this in a reliable (one visit, one pageview, without repetitions) and robust (thinking on performance, not just a new table attribute 'visits_count')

You can't.
Multiple people can visit your site from the same IP (makes ip storage useless)
Multiple people can visit your site from the same PC
Multiple people can visit your site from the same browser (makes cookies useless)
The closest you can get is to store the visitors IP in combination with a cookie to not count those in the future. Here is a tradeoff, if they clear the cookie, they are a new visitor. If you only store the IP, you count whole proxies as one visitor.
Another option is to use user accounts and track exactly which user viewed what page, but this is not really a good option for public sites.

I would suggest, actually, using Google Analytics. While nothing will ever be truly accurate, they do take a pretty gosh-darn good stab at it, and the level of reporting and useful information you can extract is amazing.
Not to mention being free for up to 4,000,000 page views a month.
http://www.google.com/analytics

One way to handle the performace issue, is to do the calculations on insert. Something like
UPDATE stat SET viewcount = viewcount+1 WHERE date = CURDATE()
It isn't the perfect solution but could get you some of the way.
To do it without repetitions is potentially a tricky problem. I would simply rely on using sessions or cookies, but you could come up with all kinds of strategies for filtering crawlers, clients with out cookie-support and so on.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex