Tracking other websites' users (Analytics) - http

I was wondering if there was any (legal) way to track websites analytics of any random site.
What I mean is simple, is there a way to actually know how many people enter Google a month?
I read around and found that Alexa extrapolates data from users with their toolbar and some cookies, im not interested in that, I mean a true measurement.
Where would you have to start if you wanted to write something like this?
I guess this is no possible but I canĀ“t really understand why, are there any illegal ways of doing this?

I read around and found that Alexa extrapolates data from users with their toolbar and some cookies, im not interested in that, I mean a true measurement.
Nope, there is no way to do that. That's why Alexa needs to rely on extrapolation.
You'll have to rely on what the sites tell you.
Some commercial sites use services like Nielsen that keep an "official" visitor count on whose basis advertising prices are calculated, but this is not the rule.

Related

HERE API - Place ID

Can I save the place ID from Here Places API, for an indefinite period of time? I tried searching the policy, but coudn't able to find the solution. We will appreciate your help.
Places data is subjected to change whenever any details of the place is modified(like website, location, phone number etc.) So there is no fool-proof way to rely on the old api output for an indefinite period of time.

Wondering how to achieve this (sharing WP page via email and tracking it)

So the following which I'm writing is just to discuss whether something like this is even possible or if any of you would have any better ideas/suggestions or understanding how this might work. I thank anyone who takes time to read this in advance and I hope I don't explain myself too incoherently:
Let's say I have a page in WordPress which has a little bit of text and a video. Basicly I would like to share that page's link or I'd want to forward that page via e-mail to a certain group of people (let's say 10-50 specifically chosen people) and I want to track who of them opened the link and for how long they were on the page or watched the video.
I would like to make this happen in a way that I wouldn't have to make 50 different pages or 50 different URLs for each person (or 50 different tracking strings for that matter). Or that I wouldn't have to take a newsletter-mailer type page in between this process.
Basicly, I would like to make the sharing/forwarding and analytics overview process as easy as possible, so that an admin or moderator wouldn't have to check too many different pages to get the info.
I really appreciate any and all feedback.
[Also really sorry if I posted this in the wrong place. Please feel free to redirect me to a corresponding slot].
Technically, Google Analytics isn't meant to be used to track this specifically- it's typically meant to track groups of anonymized users. That being said, it is capable of doing this (but may not be as automated as you had hoped).
You are correct in thinking that you'd either need to duplicate the pages or create multiple different campaign URLs.
The other thing to keep in mind is that as emails are forwarded, there is no way to update the URL after the email has been sent, so if you email me and I forward it to someone else who clicks through, you're going to think someone else is me.
One way around this would be if you know your users IP addresses (not only is that a big "if", but it can also be spoofed), or some other uniquely identifying feature (any chance these people have signed-up through your website and have actual user IDs? That'd make things infinitely easier!).
Maybe you could customize the email to add their email address as a query string? That could still require a lot of work (and you couldn't just share a single link).
Now, you can not store personally identifiable info in GA (including IP and email addresses), but at the server-level you could assign a custom dimension with a uniquely generated ID and send that to GA. Now you've got all the info you need!
Unfortunately this method only works if you can detect some kind of "fingerprint" of your users.
Unfortunately what you described isn't quite what Google Analytics was designed to do. If you wanted to get into detailed user-specific tracking, I'd advise you look into a CRM. Those systems are designed specifically for user tracking as you described.
Hope that gets you pointed in the right direction.

How to Sample Adobe Analytics (Omniture) Data

I can't find anything on the web about how to sample Adobe Analytics data? I need to integrate Adobe Analytics into a new website with a ton of traffic so the stakeholders want to sample the data to avoid exorbitant server calls. I'm using DTM but not sure if that will help or be a non-factor? Can anyone either point me to some documentation or give me some direction on how to do this?
Adobe Analytics does not have any built-in method for sampling data, neither on their end nor in the js code.
DTM doesn't offer anything like this either. It doesn't have any (exposed) mechanisms in place to evaluate all requests made to a given property (container); any rules that extend state beyond "hit" scope are cookie based.
Adobe Target does offer ability to output code based on % of traffic so you can achieve sampling this way, but really, you're just trading one server call cost for another.
Basically, your only solution would be to create your own server-side framework for conditionally outputting the Adobe Analytics (or DTM) tag, to achieve sampling with Adobe Analytics.
Update:
#MichaelJohns comment below:
We have a file that we use as a boot strap file to serve the DTM file.
What I think we are going to do is use some JS logic and cookies
around that to determine if a visitor should be served the DTM code.
Okay, well maybe i'm misunderstanding what your goal here is (but I don't think I am) but that's not going to work.
For example, if you only want to output tracking for 50% of visitors, how would you use javascript and cookies alone to achieve this? In order to know that you are only filtering 50%, you need to know the total # of people in play. By itself, javascript and cookies only know about ONE browser, ONE person. It has no way of knowing anything about those other 99 people unless you have some sort of shared state between all of them, like keeping track of a count in a database server-side.
The best you can do solely with javascript and cookies is that you can basically flip a coin. In this example of 50%, basically you'd pick a random # between 1 and 100 and lower half gets no tracking, higher half gets tracking.
The problem with this is that it is possible for the pendulum to swing 100% one way or the other. It is the same principle as flipping a coin 100 times in a row: it is entirely possible that it can land on tails all 100 times.
In theory, the trend over time should show an overall average of 50/50, but this has a major flaw in that you may go one month with a ton of traffic, another month with few. Or you could have a week with very little traffic followed by 1 day of a lot of traffic. And you really have no idea how that's going to manifest over time; you can't really know which way your pendulum is swinging unless you ARE actually recording 100% of the traffic to begin with. The affect of all this is that it will absolutely destroy your trended data, which is the core principle of making any kind of meaningful analysis.
So basically, if you really want to reliably output tracking to a % of traffic, you will need a mechanism in place that does in fact record 100% of traffic. If I were going to roll my own homebrewed "sampler", I would do this:
In either a flatfile or a database table I would have two columns, one representing "yes", one representing "no". And each time a request is made, I look for the cookie. If the cookie does NOT exist, I count this as a new visitor. Since it is a new visitor, I will increment one of those columns by 1.
Which one? It depends on what percent of traffic I am wanting to (not) track. In this example, we're doing a very simple 50/50 split, so really, all I need to do is increment whichever one is lower, and in the case that they are currently both equal, I can pick one at random. If you want to do a more uneven split, e.g. 30% tracked, 70% not tracked, then the formula becomes a bit more complex. But that's a different topic for discussion ( also, there are a lot of papers and documents and wikis out there published by people a lot smarter than me that can explain it a lot better than me! ).
Then, if it is fated that that I incremented the "yes" column, I set the "track" cookie to "yes". Otherwise I set the "track" cookie to "no".
Then in in my controller (or bootstrap, router, whatever all requests go through), I would look for the cookie called "track" and see if it has a value of "yes" or "no". If "yes" then I output the tracking script. If "no" then I do not.
So in summary, process would be:
Request is made
Look for cookie.
If cookie is not set, update database/flatfile incrementing either yes or no.
Set cookie with yes or no.
If cookie is set to yes, output tracking
If cookie is set to no, don't output tracking
Note: Depending on language/technology of your server, cookie won't actually be set until next request, so you may need to throw in logic to look for a returned value from db/flatfile update, then fallback to looking for cookie value in last 2 steps.
Another (more general) note: In general, you should beware sampling. It is true that some tracking tools (most notably Google Analytics) samples data. But the thing is, it initially records all of the data, and then uses complex algorithms to sample from there, including excluding/exempting certain key metrics from being sampled (like purchases, goals, etc.).
Just think about that for a minute. Even if you take the time to setup a proper "sampler" as described above, you are basically throwing out the window data proving people are doing key things on your site - the important things that help you decide where to go as far as giving visitors a better experience on your site, etc..so now the only way around it is to start recording everything internally and factoring those things in to whether or not to send the data to AA.
But all that aside.. Look, I will agree that hits are something to be concerned about on some level. I've worked with very, very large clients with effectively unlimited budgets, and even they worry about hit costs racking up.
But the bottom line is you are paying for an enterprise level tool. If you are concerned about the cost from Adobe Analytics as far as your site traffic.. maybe you should consider moving away from Adobe Analytics, and towards a different tool like GA, or some other tool that doesn't charge by the hit. Adobe Analytics is an enterprise level tool that offers a lot more than most other tools, and it is priced accordingly. No offense, but IMO that's like leasing a Mercedes and then cheaping out on the quality of gasoline you use.

How to determine demographics of users visiting your site?

Ad-Servers seem (and do) know a lot about the use who is visiting a certain webpage leveraging Behavioral and Contextual Targeting. I would love to be able to keep track of that data as well. In particular I would like to know:
age range
male/female
geographical info
I would like this information on a per request basis (not a daily summary)
What is the best way to accomplish this?
Thanks!
There are vendors who specialize in characterizing your Site's traffic. Very roughly they work by finding the closest match to your Site from among a large population of Sites in which they do in fact have detailed demographic data. To improve the matching, some of them give you a javascript snippet to insert into your Site's pages to collect user data and send it to their servers (more or less like web analytics code).
Quantcast is such vendor. The link i included will take you to their page that displays sample audience demographic reports.
Crowd Science is another.
Neither of these are free (though they might have a freemium service, i don't know.
Alexa, on the other hand, is free and offers similar data; just enter your Site's url in their textbox, then when you get the results page, select the Audience tab.
Age and Gender: Ask your users.
Geographical Info: Use GeoIP targeting.
You can try Hitwise, but it's a little on the pricey side IIRC
Doug's is a good answer, but Google Analytics now gives you this too, based on their acquisition of DoubleClick. So it's free.
Google Analytics Demographics & Interests
Note that no matter who you get this information from, the information is based on cross-site information. This is based on "third party cookies" which many users turn off (sometimes without knowing they are doing this) depending on their browser's security/privacy settings.

When is Google Analytics not good enough? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm trying to determine why an enterprise wouldn't want to use Google Analytics.
Here are the main reasons I've seen mentioned:
Inability to track clients that have Javascript disabled.
Lack of ownership of the statistics - Google owns the data.
Most of the web clients with Javascript disabled will probably be bots/spiders. This data is interesting, but probably not very useful.
As for the ownership issue, this is a bit paranoid IMO.
What am I missing here? When is Google Analytics not good enough?
Here are my findings from additional research:
Google Analytics is limited to 5 million page views per month - source
If a web site generates more than 5 million pageviews per month it will need linked to an active AdWords account to avoid interruption of service.
Lack of / slow technical support
All Google support is handled through email and response times can take a week or more. Commercial analytics products often have much faster & personalized support.
Inability to track files (PDF's, Images, etc.)
GA relies on Javascript and files lack the ability to execute Javascript. The workaround to this problem is to tag the link, but this won't track requests that go directly to the file.
Limited ability to customize
This is a selling point that I see pushed by commercial analytics tools (WebTrends). However it's never explained what customizations are denied by GA but allowed by WebTrends.
The Google Analytics EULA does not allow you to track individual users by identifying them. So if you wanted to add a custom variable for username to track how many times each user logs in, then you would be in a gray zone if not outright violating the EULA.
I use Google Analytics on about 10 sites right now and it's a great tool. In addition to all the analytics stats, you can tie it in with AdSense and it becomes a marketing/revenue tool and not just "wow look at all these cool user stats". If there was a way to track by user ID in certain circumstances (e.g. if user's agreed to it, or if they work for the company that owns the site) then I would have no issues.
Besides, it's free and all you have to do is add JavaScript to the files, so give it a try and see what you think after a few months.
One reason that was, surprisingly, not posted:
timing / speed of reaction
It takes at least 4 hours (up to 24) for GA to update your data.
This is ok for me personally in most of the cases, but when reacting fast is crucial (news sites, one-off events, etc.) you may want to employ some other solution (Mint comes to mind, but it's not the only one out there of course).
Thought I'd add my two pence worth to this thread, as this a topic close to my heart and one I've debated with colleagues for years. We've used webtrends in house for as long as i can remember, back to version 4 of the log analyzer (how different things were back then!). Since Google Analytics came along, we've started to come under increasing pressure from certain parts of our business to switch, as 'it does everything we need form an analytics tool'
Well, true in many senses it does, especially these days. But I championed the integration of our CRM and web analytics tools back in 2006, and as our business isn't e-commerce (the 'conversion' happens offline, sometimes months after the visitor acquisition) we need to integrate in this way to get a true picture of campaign effectiveness, and notion of ROI.
All of this means, we need access to the raw data, need to be able to join visitor records on sessionID etc, without this access we'd be screwed. I'd love it if we could roll without it, but the current requirements mean we can't, so this alone is a HUGE reason why Google analytics is not good enough.
Over and out
For tracking desktop software or creating a whitelabel solution there are better solutions.
For white label an integration based analytics, i use MixPanel. For Desktop Software, i use Deskmetrics
Google Analytics does not work well with mobile phones. While the iPhone and the Palm may be supported, many of the existing handsets do not support the javascript that Google uses.
If you're based in the UK, then theoretically you could be breaking the Data Protection Act by using Analytics.
If information about your users (like which web pages they're looking at) goes "outside the European Economic Area" and onto Google's servers in the US, then you're breaking the DPA.
Pretty obscure, but you did ask :)
Piwik avoids the problem because you host it on your own servers.
Lack of ownership of the statistics - Google owns the data.
... As for the ownership issue, this is a
bit paranoid IMO.
One problem with it is that we can't even access the raw data. We had a use case this week where we wanted a visitor map for an executive presentation. We needed to get more flexible with how the visitor map is displayed (wanted to view the map in Google Earth plug-in). In GA, you can't. You take what they give you. You can see a map of how many visits came from each city, but you can't export a data file of cities and number of visits, to run the data through other tools. So, paranoia aside, there are significant limitations on what you can accomplish with GA.
However this is not a problem if you use Urchin, the self-hosted version of GA: you can export the data and do what you want with it. (And the exported data is richer than the web server log's, as it includes some analysis already.)
Since Piwik is open source, and pluggable, I imagine you could enhance the visitor map plug-in any way you wanted to. And export whatever data you want.
Whether this limitation affects you depends on your needs, obviously.
Update: I've now looked at the GA Data Export API, and it turns out that things you cannot do through the UI (as you can with Urchin), you can do with this API. It does look like you can export the visit data I was talking about, via a feed (although there are daily traffic caps on those requests). So sprinkle salt heavily on what I wrote above.
A couple more points that I've come across:
GA doesn't let you dig beyond full-day statistics; I would often like the ability to investigate whether a traffic dip the previous day was caused by the design update I did at 1pm or the soccer match on TV at 8pm.
GA doesn't offer a workaround for traffic spikes caused by DDoS attacks, Slashdotting etc. When I'm looking at a GA visitor graph of 2009, all I can see is the 2-million-pageview-spike on October 16th, pushing the entire rest of the year down flat against the horizontal axis of the graph. To get a meaningful graph, GA should offer the ability to trim or exclude outlying data points, or the ability to limit/bracket the graph window itself
GA doesn't have an event monitoring client (think Reinvigorate's Snoop tool)
While GA is very user-friendly, I've found it's not as granular as some of the other stats programs (or maybe I'm not looking in the right places). Before the marketing monkeys I work with began pushing GA, we were very satisfied with AWStats. The sheer scope of the data helped us on several occasions hone sites to better suit their audience. While GA is very shiny and laid out well, I personally still prefer the raw numbers like I used to get through AWStats.
Slow data processing speed - Can be as low as 15-30 mins for page views, but may be up to 48 for eCommerce
EULA is limiting in some cases
You won't own or have any control of the data. Google's engineers might use it (anonymously) for testing
Anything more complex requires customization - Downloads and such care of no issue, but there are limits
Cross domain tracking by linker is faulty at best
Visit based - Proper tools are based on Visitor level, GA works on Visit based reporting mostly
Limited number of custom vars used at one time (5)
No tech support, if you're realistic
Usually when there is a downtime notice, it's already gone
API limitations (4 dimensions and 10 metrics at one time, not all can be used together in addition to that)
I have many more, but at the end of the day it is a good tool for it's price.
From the non-technical point, I think the most important is that some enterprise has the high level data security policy. All of the data should be controlled and managed by themselves.
If you use the Google analytics,the data is stored in google's server. For some special enterprise, like insurance, financial company. The policy should be followed.
I would NOT go with server logs. In fact I have them disabled on my server. Why you ask me?
For the simple reason that everytime you hit my server that stupid logging program makes an entry in the physical log file on my HDD. So if my server gets 100,000 hits in a day that's 100,000 time a HDD write operation happens.
You think that's cool? Well it's not. It's slowing your server down, specially if the log file is huge.
Why would someone even consider doing that to their server? Specially when we're working so hard to minify javascript, css and make image files 2 KB smaller!
Please do yourself a favor don't log directly on your server.
At least Google Analytics logs it on Google's server so my server's healthier.
I wouldn't use it for any of my sites, because you're forcing the user to accept your proprietary JavaScript code in their browser, which is bad. Also, giving your data is Google is a really bad idea.
See Piwiki for something you can run yourself as in free software, eliminating both of the problems.

Resources