I recently noticed that Google updated the PageSpeed Insight report page. There is an "Origin Summary".
What does it exactly mean? Why it's slightly different than the field data?
If the page you are testing has enough visitors you will see your "field data". That is real world performance for that page
The origin summary is real world performance data for all pages on the domain where there is enough data.
Basically it is page performance (Field Data) vs site performance as a whole (Origin Summary).
Related
Say I have an article which has been viewed 100 times and has an Average Visit Duration of 01:00:00 hrs. Is there any way I can break down those statistics - and see how long each individual visit lasted for?
(I should state that I'm not looking to find out information about particular IP addresses or anything like that. I just want to get some idea of the 'mode visit' - the time most people spent on the page.)
Google Analytics doesn't provide enough detailed insights for invividual visitor details. If you want a more granular data try CardioLog Analytics
Yes, right, Google doesn't provide that. I tend to use sitemeter in conjuction with Google. Not sure if I recommend sitemeter though. It does give specifics about individual visitors, but they are very flaky. I don't think I've ever gotten a response from their so-called "tech support" or anything else from them.
The short answer is no, you can't. Google Analytics doesn't provide individual visitor details as it violates the GA Terms of Service.
However there are a couple ways to get at or close to this information:
1) Create an advanced segment - use the "Page" dimension and include the URI of the article on your site. Apply it and then look at the city or service provider report - it will show you all visits that viewed the article.
2) Keep a copy of the tracking data sent to Google and process it with on premises web analytics software that doesn't have the same ToS/privacy restrictions.
We have been asked to increase the performance of a clients site search. Before we start we would like to set benchmarks. I have asked the client if they are comfortable with enabling unanimous data sharing so we have access to industry benchmarks as I don't have control over this setting: http://support.google.com/analytics/bin/answer.py?hl=en&answer=1011397 however it sounds like things have changed in the google analytics camp and these reports are only available via a newsletter now? Is this true?
Also, will these reports give me industry standards to compare my clients current search performance against? Or is there another service that has these baseline standards available?
Here's an example of the data we are interested in. This is our clients current search performance:
Visits with Search: 772
Total Unique Searches: 1,093
Results Page Views/Search: 1.36
% Search Exits: 56.45%
% Search Refinements: 24.78%
Time after Search: 00:01:40
Search Depth: 0.59
I work at large ecommerce site, and I asked our AdWords rep about this, having recently wanted access to this kind of data myself.
He said that benchmarking was removed 3/15/11, at which point they were experimenting with a monthly newsletter format to deliver the same kind of data.
From what I've seen they may have done one newsletter before (quietly) retiring it completely. I never saw the newsletter, but I think I remember reading reports of people who did receive one.
Disappointing to know they had access to all that data, but pulled the plug on the program. I wonder if they killed it due to data integrity concerns--they can't guarantee correct tracking-code installations on all these sites opting in, so what is the data worth if it's of questionable quality. iono... just a total guess.
We used to use coremetrics here, and they had an opt-in benchmarking program. So if you know any other webmasters using Coremetrics, you could probably ask them to pull some benchmarking info.
We were able to get some benchmarking data from fireclick.com, but none of it (that I've seen anyways) covers on-site search. Mainly just top line metrics. :-/
So the search for benchmark data continues...
Ad-Servers seem (and do) know a lot about the use who is visiting a certain webpage leveraging Behavioral and Contextual Targeting. I would love to be able to keep track of that data as well. In particular I would like to know:
age range
male/female
geographical info
I would like this information on a per request basis (not a daily summary)
What is the best way to accomplish this?
Thanks!
There are vendors who specialize in characterizing your Site's traffic. Very roughly they work by finding the closest match to your Site from among a large population of Sites in which they do in fact have detailed demographic data. To improve the matching, some of them give you a javascript snippet to insert into your Site's pages to collect user data and send it to their servers (more or less like web analytics code).
Quantcast is such vendor. The link i included will take you to their page that displays sample audience demographic reports.
Crowd Science is another.
Neither of these are free (though they might have a freemium service, i don't know.
Alexa, on the other hand, is free and offers similar data; just enter your Site's url in their textbox, then when you get the results page, select the Audience tab.
Age and Gender: Ask your users.
Geographical Info: Use GeoIP targeting.
You can try Hitwise, but it's a little on the pricey side IIRC
Doug's is a good answer, but Google Analytics now gives you this too, based on their acquisition of DoubleClick. So it's free.
Google Analytics Demographics & Interests
Note that no matter who you get this information from, the information is based on cross-site information. This is based on "third party cookies" which many users turn off (sometimes without knowing they are doing this) depending on their browser's security/privacy settings.
After being stumped by an earlier quesiton: SO google-analytics-domain-data-without-filtering
I've been experimenting with a very basic analytics system of my own.
MySQL table:
hit_id, subsite_id, timestamp, ip, url
The subsite_id let's me drill down to a folder (as explained in the previous question).
I can now get the following metrics:
Page Views - Grouped by subsite_id and date
Unique Page Views - Grouped by subsite_id, date, url, IP (not nesecarily how Google does it!)
The usual "most visited page", "likely time to visit" etc etc.
I've now compared my data to that in Google Analytics and found that Google has lower values each metric. Ie, my own setup is counting more hits than Google.
So I've started discounting IP's from various web crawlers, Google, Yahoo & Dotbot so far.
Short Questions:
Is it worth me collating a list of
all major crawlers to discount, is
any list likely to change regularly?
Are there any other obvious filters
that Google will be applying to GA
data?
What other data would you
collect that might be of use further
down the line?
What variables does
Google use to work out entrance
search keywords to a site?
The data is only going to used internally for our own "subsite ranking system", but I would like to show my users some basic data (page views, most popular pages etc) for their reference.
Lots of people block Google Analytics for privacy reasons.
Under-reporting by the client-side rig versus server-side eems to be the usual outcome of these comparisons.
Here's how i've tried to reconcile the disparity when i've come across these studies:
Data Sources recorded in server-side collection but not client-side:
hits from
mobile devices that don't support javascript (this is probably a
significant source of disparity
between the two collection
techniques--e.g., Jan 07 comScore
study showed that 19% of UK
Internet Users access the Internet
from a mobile device)
hits from spiders, bots (which you
mentioned already)
Data Sources/Events that server-side collection tends to record with greater fidelity (much less false negatives) compared with javascript page tags:
hits from users behind firewalls,
particularly corporate
firewalls--firewalls block page tag,
plus some are configured to
reject/delete cookies.
hits from users who have disabled
javascript in their browsers--five
percent, according to the W3C
Data
hits from users who exit the page
before it loads. Again, this is a
larger source of disparity than you
might think. The most
frequently-cited study to
support this was conducted by Stone
Temple Consulting, which showed that
the difference in unique visitor
traffic between two identical sites
configured with the same web
analytics system, but which differed
only in that the js tracking code was
placed at the bottom of the pages
in one site, and at the top of
the pages in the other--was 4.3%
FWIW, here's the scheme i use to remove/identify spiders, bots, etc.:
monitor requests for our
robots.txt file: then of course filter all other requests from same
IP address + user agent (not all
spiders will request robots.txt of
course, but with miniscule error,
any request for this resource is
probably a bot.
compare user agent and ip addresses
against published lists: iab.net and
user-agents.org publish the two
lists that seem to be the most
widely used for this purpose
pattern analysis: nothing sophisticated here;
we look at (i) page views as a
function of time (i.e., clicking a
lot of links with 200 msec on each
page is probative); (ii) the path by
which the 'user' traverses out Site,
is it systematic and complete or
nearly so (like following a
back-tracking algorithm); and (iii)
precisely-timed visits (e.g., 3 am
each day).
Biggest reasons are users have to have JavaScript enabled and load the entire page as the code is often in the footer. Awstars, other serverside solutions like yours will get everything. Plus, analytics does a real good job identifying bots and scrapers.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I was a happy customer of Google Analytics starting from the Urchin times. But something strange happened a few months ago and GA started showing a fake URL called "(other)" that is credited between 5% and 45% of all site traffic. I've tried filtering out some URL parameters to reduce the number of pages. Currently GA shows only 150,000 pages on my site, which is well below the half million limit that some people are talking about. Still, the page "(other)" is showing as the most popular page on my site.
Is anybody else struggling with this issue? I am wondering whether this could be a scalability issue. My site has been growing over the years, and currently doing 1.25 million unique monthly visitors and over 10 million pageviews. The site itself has around half a million pages. If you are successfully using GA with a bigger website than mine, please share your story. Are you using the Sampling feature of their tracking script?
Thanks!
For a huge website like and I would not use a Free Analytics. I would use something like Web trends or some other paid analytics. We cannot blame GA for this after all its a free service ;-)
GA has page view limits too. (5 Million page views)
Just curious. How long did you take to add the analytics code to your pages? ;-)
In Advanced Web Metrics with Google Analytics Brian Clifton writes that above a certain number of page views, Google Analytics is no more able to list all the seperate page views and starts aggregating the small amount ones under „(other)” entry.
By default, Google Analytics collects
pageview data for every visitor. For
very high traffic sites, the amount
of data can be overwhelming, leading
to large parts of the “long tail” of
information to be missing from your
reports, simply because they are too
far down in the report tables. You can
diminish this issue by creating
separate profiles of visitor
segments—for example, /blog, /forum,
/support, etc. However, another option
is to sample your visitors.
I get about 3.5 million hits a month on one of my sites using GA. I don't see (other) listed anywhere. Specifically what report are you viewing? Is (other) the title or URL of the page?
You can get a loooonnnngggg way on Google Analytics. I had a site doing about 25mm uniques/mo. and it was working for us just fine. The "other" bucket fills up when you hit a certain limit of pageviews/etc. The way around this is to create different filters on the data.
For a huge website (millions of page views per day), you should try out SnowPlow:
https://github.com/snowplow/snowplow
This will give you granular data down to the individual page URLs (unlike Google Analytics at that volume) and, because it is based on Hadoop/Hive/Infobright, it will happily scale up to billions of page views.
Its more to do with a daily limit of unique values for a metric they will report on. if your site uses querystring parameters, all those unique values and parameter variations are seen as separate pages and cause the report to go over the limit of 50,000 unique values in a day for a metric. To eliminate, you should add all the big culprits querystring names to be ignored, making sure however to not add any search querystring names if search is on.
On the Profile Settings, add them to the Exclude URL Query Parameters textbox field, delimited by commas. Once I did this, the (other) went away from the reports. It takes affect at the point they are added, previous days will still have (other) displaying.