Basic site analytics doesn't tally with Google data - google-analytics

After being stumped by an earlier quesiton: SO google-analytics-domain-data-without-filtering
I've been experimenting with a very basic analytics system of my own.
MySQL table:
hit_id, subsite_id, timestamp, ip, url
The subsite_id let's me drill down to a folder (as explained in the previous question).
I can now get the following metrics:
Page Views - Grouped by subsite_id and date
Unique Page Views - Grouped by subsite_id, date, url, IP (not nesecarily how Google does it!)
The usual "most visited page", "likely time to visit" etc etc.
I've now compared my data to that in Google Analytics and found that Google has lower values each metric. Ie, my own setup is counting more hits than Google.
So I've started discounting IP's from various web crawlers, Google, Yahoo & Dotbot so far.
Short Questions:
Is it worth me collating a list of
all major crawlers to discount, is
any list likely to change regularly?
Are there any other obvious filters
that Google will be applying to GA
data?
What other data would you
collect that might be of use further
down the line?
What variables does
Google use to work out entrance
search keywords to a site?
The data is only going to used internally for our own "subsite ranking system", but I would like to show my users some basic data (page views, most popular pages etc) for their reference.

Lots of people block Google Analytics for privacy reasons.

Under-reporting by the client-side rig versus server-side eems to be the usual outcome of these comparisons.
Here's how i've tried to reconcile the disparity when i've come across these studies:
Data Sources recorded in server-side collection but not client-side:
hits from
mobile devices that don't support javascript (this is probably a
significant source of disparity
between the two collection
techniques--e.g., Jan 07 comScore
study showed that 19% of UK
Internet Users access the Internet
from a mobile device)
hits from spiders, bots (which you
mentioned already)
Data Sources/Events that server-side collection tends to record with greater fidelity (much less false negatives) compared with javascript page tags:
hits from users behind firewalls,
particularly corporate
firewalls--firewalls block page tag,
plus some are configured to
reject/delete cookies.
hits from users who have disabled
javascript in their browsers--five
percent, according to the W3C
Data
hits from users who exit the page
before it loads. Again, this is a
larger source of disparity than you
might think. The most
frequently-cited study to
support this was conducted by Stone
Temple Consulting, which showed that
the difference in unique visitor
traffic between two identical sites
configured with the same web
analytics system, but which differed
only in that the js tracking code was
placed at the bottom of the pages
in one site, and at the top of
the pages in the other--was 4.3%
FWIW, here's the scheme i use to remove/identify spiders, bots, etc.:
monitor requests for our
robots.txt file: then of course filter all other requests from same
IP address + user agent (not all
spiders will request robots.txt of
course, but with miniscule error,
any request for this resource is
probably a bot.
compare user agent and ip addresses
against published lists: iab.net and
user-agents.org publish the two
lists that seem to be the most
widely used for this purpose
pattern analysis: nothing sophisticated here;
we look at (i) page views as a
function of time (i.e., clicking a
lot of links with 200 msec on each
page is probative); (ii) the path by
which the 'user' traverses out Site,
is it systematic and complete or
nearly so (like following a
back-tracking algorithm); and (iii)
precisely-timed visits (e.g., 3 am
each day).

Biggest reasons are users have to have JavaScript enabled and load the entire page as the code is often in the footer. Awstars, other serverside solutions like yours will get everything. Plus, analytics does a real good job identifying bots and scrapers.

Related

google analytics-multiple data streams for multiple URLs?

I want to use Firebase Analytics in my website in order to get some statistics for the visitors of each page (I don't want to track user journey in the site). I wanted to define multiple data streams (one for each url) in my google analytics dashboard, but then it warned me with the following message:
In most cases, a single web stream will meet your measurement needs. Using multiple web streams to measure different pages or sites in a single user’s journey may lead to inconsistent results.
in my case-where I want to see the statistics of my site based on its pages (urls)-should I define multiple data streams?
As the message says, it is not necessary to split based on the path in the web site.
You can in the Google Analytics console instead filter based on that path. This gives you the best of both worlds, as you can show stats for a specific path, but also for the site in its entirety.
I ended up using separate data streams in a similar situation where we had a multilingual site with a domain-per-language. The analytics dashboard lets you separate the data by domain, but the tools are bulky and don't seem available everywhere.
In short, creating a separate stream for data that is always going to be viewed separately can be a real convenience, even if it's not "the right way".
The main caveat from the data-streams documentation seems to be that you can miscount data. For instance, a user switching from the English site to the French site will be counted as a visitor on each rather than as a single visit. As long as you're aware of the data implications, you should be okay.

Google Analytics tracking for PDA emails

I have a requirement where I need to track whether a user clicked a link in a PDA email where the link included in the email is >900 characters.
I'm not sure if Google analytics support tracking in PDA.
If anyone has ever done this,please help me out.
Thanks
I seem to have misunderstood the question, so here is an update. Google will usually track any valid Urls. The two exceptions I can think of are more theoretical than a practical concerns.
Some old browsers (I think IE6 and similar vintages) have a character limit for GET requests (2048 bytes IIRC), so very long links will not work, and this not be tracked correctly. For all practical purposes these browsers should be extinct by now
A Google Analytics request is limited to 8096 bytes.The request has to transmit the document location as part of the payload, so if your URL is really massively oversizes (technically 8000 characters is ">900") this would not be tracked. Again, this is hardly a practical concern (unless there is a lot of other data, like e.g. Enhanced E-Commerce product impressions in that request).
Old (and probably irrelevant) answer:
Google Analytics does typically not track actions within emails, since email clients do not usually support javascript (there are implementations of email open tracking via "web bugs" linked to a script that does a measurement protocol request, but event that does not work particularly well).
If this is a link that points to your homepage the typical way to track this would be via utm parameters - i.e. you do not track the action within the email itself, but the result (the visit to your homepage).
UTM parameters (or "campaign parameters") are
utm_medium - the kind of traffic (if it's paid advertising, banner ads, or in your case e.mail)
utm_source - the specific vendor (e.g. "google" if the link is from a paid Google Ad, or in your case it could be the name of the department that sent out the mail)
utm_campaign - your advertising campaign; in the case of a periodic newsletter this could be e.g. the number of the newsletter
utm_term - you usually would not use that in an email, that's reserved for when a link is a result of a search (then you would insert the search term)
utm_content - if you have multiple links with the same link target and campaign info you can add additional information (e.g. if you have the same link at the top and the bottom of your mail you could indicate the position here)
You cannot do anything dynamic, though - if you want to mark links with a specific character count you would have to do this within your newsletter programm and insert the number. GA would then be able to pick this up from the campaign parameters.
E.g. for your use case you might construct a target URL like
www.example.com?utm_medium=email&utm_source=my_department&utm_campaign=pda_mail&utm_content=<number of characters>
and then get the information from the Aquisition reports in Google Analytics.
If the links do not point to your own homepage you would need to set up an intermediate page that tracks the utm_parameters before it redirects to the intended destination.

How can I pull data from Google Analytics to see the top pages visited from the current page?

I would like to create a small sidebar on each page of my website that contains related/popular pages with perhaps the top five pages users visit after reading the current page.
I could track and record user movements across the site myself and build the list that way, but as my site already uses Google Analytics and I know the data is there I'd rather access that if all possible.
The trouble is that I don't have the faintest idea whether it is possible or not.
Remember that the Google Analytics Reporting API is not real-time it can take between 24 - 48 hours for the data to finish processing and be in the API for you to request.
The Realtime Google Analytics api is real time but the data is only about 5 minutes old and its very limited to the dimensions and metrics you can request.
Quota, with either of those APIs you are limited to 10,000 requests per day per profile / view. I have no idea how many pages there are on your site or how may users are on your site but this could quickly blow out this NON extendable quota.
Options: Except that its not realtime data and use the reporting api every night run a request against the api get everything for two days ago then show your users on your site data that's two days old. Store the data in the database then you are showing them data on in your DB and wont have an issue with the quota as you only requested it once.
But this isn't exactly what you want as its not showing a users activity over the site. TBH I am not sure you can exactly use Google Analytics to track a user as the data is user non specific.
If you don't want to get involved with learning the API and develop this from the ground up, check out EmbeddedAnalytics (disclaimer: I created the service). We could provide such a widget.
You may find This Article useful. It provides the necessary query to find the "next page visited" using the page of interest as a filter. Ultimately your query would look like this:
https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3Aabc&start-date=30daysAgo&end-date=yesterday&metrics=ga%3Apageviews&dimensions=ga%3ApreviousPagePath%2Cga%3AnextPagePath&sort=-ga%3Apageviews&filters=ga%3ApreviousPagePath%3D%40pricing
The query above will give you the "Next Page" along with pageviews assuming the "previous" page contains the word "pricing".
We could easily build such report widget for you:
You would insert a javascript source code snippet into your page. The javascript would pass the page url to our server and we would return the next "most popular pages visited".
The pages could be "linkified" so that someone could click the link to go to that page.
We already have caching mechanism in place. So each pageview would not require a new query to google (making it quicker and also staying away from the API quota that was mentioned above). For pages that are hardly ever looked at (e.g. less than once a week), we could make "on-demand" calls to get the statistics.
In my experience with the API, the lag in the API is only a couple hours. It may be longer for larger sites.
Please let me know if you are interested in such widget and I can work with you.

Google Analytics: Profile Workaround

I currently have more than 50 microsites on my main websites. That is I have one main top level domain and I have more than 50 microsites (and growing) in subfolders on that domain.
Previously I used separate GA web properties for the separate microsites (different GA tracking ID's), which worked fine and I was able to track each sites' activity well. However, I talked to a GA staffer over email and he told me I should switch to using a singular GA web property and use multiple profiles to segment the data by subfolder/microsite. That seemed logical for a lot of reasons, tracking users over the entirety of the website in one GA session being the main one.
Anyway, I have one subfolder which houses an array of microsites, numbering almost 40 right now. I don't necessarily need to have a profile for each one of these sites but there are a couple of important ones that I need to report on individually and on a regular basis I'd like to see how traffic to the other individual sites are doing.
So my question: Is there a way in a single profile to segment data to 40+ (and growing) microsites and see month to month stats on each site? Is there a way I can load a profile dashboard with the stats (Visits/pageviews) from each microsite? Is segmenting the data even what I should be looking at? How would you, a more advanced GA user, tackle this problem?
Many thanks for your input!
jimdo (http://www.jimdo.com) offers a Google Analytics based statistics tool for their DIY website creator. They put hundreds of the (usually low traffic) sites in one profile, set a custom var with a unique ID per site and query the results via the Google API, segmented by site id (at least that is what one of their founders told during a web analytics conference a few months ago). Given that the solution works for a couple of million of client sites (their claim is to host 7 million websites for their clients) segmentation based on a unique site id seems a pretty solid idea.
Updated: As custom vars are deprecated with Universal Analytics you'd now use a custom dimension instead if a custom var. Apart from that the approach should still work.

How to determine demographics of users visiting your site?

Ad-Servers seem (and do) know a lot about the use who is visiting a certain webpage leveraging Behavioral and Contextual Targeting. I would love to be able to keep track of that data as well. In particular I would like to know:
age range
male/female
geographical info
I would like this information on a per request basis (not a daily summary)
What is the best way to accomplish this?
Thanks!
There are vendors who specialize in characterizing your Site's traffic. Very roughly they work by finding the closest match to your Site from among a large population of Sites in which they do in fact have detailed demographic data. To improve the matching, some of them give you a javascript snippet to insert into your Site's pages to collect user data and send it to their servers (more or less like web analytics code).
Quantcast is such vendor. The link i included will take you to their page that displays sample audience demographic reports.
Crowd Science is another.
Neither of these are free (though they might have a freemium service, i don't know.
Alexa, on the other hand, is free and offers similar data; just enter your Site's url in their textbox, then when you get the results page, select the Audience tab.
Age and Gender: Ask your users.
Geographical Info: Use GeoIP targeting.
You can try Hitwise, but it's a little on the pricey side IIRC
Doug's is a good answer, but Google Analytics now gives you this too, based on their acquisition of DoubleClick. So it's free.
Google Analytics Demographics & Interests
Note that no matter who you get this information from, the information is based on cross-site information. This is based on "third party cookies" which many users turn off (sometimes without knowing they are doing this) depending on their browser's security/privacy settings.

Resources