How to debug sudden PageView drop in google analytics? - google-analytics

We are a video-only news portal. Recently with no production-changes, our PV (in GA) number has dropped by ~ 75%+. Interestingly one of our key metric - Video Views (measured by Brightcove, our CMS) didn't indicate any such catastrophe.
Data from Nielsen and internal ad inventory also seem to disagree with GA.
What's the best possible way to debug this ?


What's Property hits volume?

Property hits volume = pageview?
I checked my Google Analytics and yearsterday I got 200 users but when i go to Administrator->Property->Property Configurations->Property hits volume, I saw that i got 10,000 Property hits volume yearsterday.
Where is this traffic coming from?!
I want to know the meaning of "Property hits volume".
Here's the definition from Google:
An interaction that results in data being sent to Analytics. Common hit types include page tracking hits, event tracking hits, and ecommerce hits.
Each time the tracking code is triggered by a user’s behavior (for example, user loads a page on a website or a screen in a mobile app), Analytics records that activity. Each interaction is packaged into a hit and sent to Google’s servers. Examples of hit types include:
page tracking hits
event tracking hits
ecommerce tracking hits
social interaction hits
Essentially combine your total number of non-unique pageviews and total number of non-unique event counts over the last 30 days, it should match closely (unless you have ecommerce and social as well)

Google Analytics Bloated Data

I manage an internal website and we recently implemented campaign tracking for our emails and homepage links to see where traffic comes from.
I set up the URLs using the Google URL builder.
The data we're receiving is very bloated. We ran a test URL with 8 people, and we received 129 "views", with an average of 9 views per day for over a month. No one clicked this link after the first day.
Our average session times were about 30 minutes, which is very strange.
My questions are:
how does google track campaigns? If you use a tracking URL, does the cookie track views for any organic views after that?
Is there a tool we can use to only track first time visits using a campaign URL?
Admittedly, I'm fairly new to Google Analytics, but no one on our marketing analytics team was able to help.
Since you used the Google URL builder I don't think you have made any mistakes there. However I strongly think that the bloated data is due to Bot traffic in your account. And yes, the bot traffic does increase average session duration.
So here's a set of steps I'll suggest:
1) Create 3 views in Google Analytics (It is a best practice):
Unfiltered, Master, Test
2) Check for Langauage spam and weird referrals in your report.
3) Add filters to "Test" view to remove these bots & spam referrals. You'll need to write a regular expression for each of these filters. Also make sure you have enabled "bot filtering" in view settings for master & test view. (I am leaving Unfiltered view as it is our data backup in case if anything goes wrong.)
4) Check your traffic for next few days and try doing the URL test again and see the results.
5) If the results in Test View are correct, then apply the same filters to "Master" view.
I hope this helps.

Google AdWords Conversions not matching database entries

I've just set up a tool on a client site that users can use to request a quote from our client. To do this the user lands on a form page, fills in their details, submits and then lands on a thank-you page. Pretty basic.
I set this process up as a goal in Google Analytics, using the destination type goal: "begins with /thank-you" and shared that goal as a conversion in Google AdWords.
I decided to run a few Google AdWords ads to promote the tool. I also wanted to double-check the conversion data that AdWords gives you so I set the destination URL in Adwords to (2, 3, 4 etc. for each ad) and I configured the DB so that there was a column that tracked which URL the user was on when filling in the form (this would be the column I counted to get the number of conversions that came from AdWords so I could compare)
Further to this, I made sure that the initial URL parameters that the user landed on were stored in the session so that if the user browsed to other pages and came back to fill in the form later, it would still attribute the conversion to AdWords.
I tested this thoroughly on a staging and production environment and everything was working correctly.
I ran the campaign for a week and when I checked, the conversion results in the Data Base vs the ones coming from AdWords are wildly different. The DB tells me I've had 5 conversions while AdWords gives me 21.
Is there anything in the way Google uses its gclid that may be causing this issue? Or is there a problem with the way I've set up the measurement structure?
This can be caused by few things, but I think this is the GA/AdWords issue, more than your DB/session set-up.
Gclid shouldn't influence your goal, since it is used only for AdWords/Analytics interactions, Goals should not be affected in your set-up.
Probable cause: If your goal set-up only contains "begins with /thank-you", isn't it possible, that you are counting all the sessions which reach thanks-you page? Not just AdWords?
Solution: if you need to count conversions in AdWords (for performance improvements), use AdWords conversion code at the same page, this counts only those users, who clicks an ad and reach your thank-you page in x (default 30) days. Be sure to count only unique conversions (users by cookie).
Differences between GA/AdWords conversion count:
Google attributes conversions to the last marketing channel, where direct visits do not count as a marketing channel (if you look at their attribution flow visualization you see that the penultimate step is to check for existing campaign information for the user). So GA might overcount Adwords visits (or other campaigns) and conversely shows fewer conversions for direct visits.
On contrast your database probably records the last traffic channel without an elaborate attribution model, so it will show less campaign traffic.
Also IIRC the adwords interface records the conversion for the time of the ad click, not the actual goal conversion, so the timeframes for the conversions differ.

Google Analytics - Why do some of my visits report 0 Pages / Visit?

I've currently looking at my Google Analytics statistics for Berlin and I can see that I have 1 Visit but 0 Pages / Visit. Is this just some blip in the GA software - or is this likely to be a problem in my code? I can't see think how someone could have visited my site without visiting one of its pages.
(This is a fairly uncommon problem, I should say. I've just noticed it now and then and was wondering what the cause of it could be.)
You have some other kind of hits.
The types of hits in GA are:
Pageviews (_trackPageview)
Events (_trackEvent)
Social Interactions (_trackSocial)
Custom timming (_trackTimming)
Set Var (_setVar) deprecated
Ecommerce items (_addItem + _trackTrans)
Ecommerce trans (_addTrans + _trackTrans)
The most common is probably _trackEvent, check your events table and try to find out why some visits get events but no pageviews.
It can also be due to bad filtering rules. Some kind of rule that filter out pageviews from a profile but not other hit types, causing visits with only other hit types.
If you still use the old _setVar you can probably see the data inside the "User Defined" report, if that's the case, then it's time to remove these calls and move on.

Basic site analytics doesn't tally with Google data

After being stumped by an earlier quesiton: SO google-analytics-domain-data-without-filtering
I've been experimenting with a very basic analytics system of my own.
MySQL table:
hit_id, subsite_id, timestamp, ip, url
The subsite_id let's me drill down to a folder (as explained in the previous question).
I can now get the following metrics:
Page Views - Grouped by subsite_id and date
Unique Page Views - Grouped by subsite_id, date, url, IP (not nesecarily how Google does it!)
The usual "most visited page", "likely time to visit" etc etc.
I've now compared my data to that in Google Analytics and found that Google has lower values each metric. Ie, my own setup is counting more hits than Google.
So I've started discounting IP's from various web crawlers, Google, Yahoo & Dotbot so far.
Short Questions:
Is it worth me collating a list of
all major crawlers to discount, is
any list likely to change regularly?
Are there any other obvious filters
that Google will be applying to GA
What other data would you
collect that might be of use further
down the line?
What variables does
Google use to work out entrance
search keywords to a site?
The data is only going to used internally for our own "subsite ranking system", but I would like to show my users some basic data (page views, most popular pages etc) for their reference.
Lots of people block Google Analytics for privacy reasons.
Under-reporting by the client-side rig versus server-side eems to be the usual outcome of these comparisons.
Here's how i've tried to reconcile the disparity when i've come across these studies:
Data Sources recorded in server-side collection but not client-side:
hits from
mobile devices that don't support javascript (this is probably a
significant source of disparity
between the two collection
techniques--e.g., Jan 07 comScore
study showed that 19% of UK
Internet Users access the Internet
from a mobile device)
hits from spiders, bots (which you
mentioned already)
Data Sources/Events that server-side collection tends to record with greater fidelity (much less false negatives) compared with javascript page tags:
hits from users behind firewalls,
particularly corporate
firewalls--firewalls block page tag,
plus some are configured to
reject/delete cookies.
hits from users who have disabled
javascript in their browsers--five
percent, according to the W3C
hits from users who exit the page
before it loads. Again, this is a
larger source of disparity than you
might think. The most
frequently-cited study to
support this was conducted by Stone
Temple Consulting, which showed that
the difference in unique visitor
traffic between two identical sites
configured with the same web
analytics system, but which differed
only in that the js tracking code was
placed at the bottom of the pages
in one site, and at the top of
the pages in the other--was 4.3%
FWIW, here's the scheme i use to remove/identify spiders, bots, etc.:
monitor requests for our
robots.txt file: then of course filter all other requests from same
IP address + user agent (not all
spiders will request robots.txt of
course, but with miniscule error,
any request for this resource is
probably a bot.
compare user agent and ip addresses
against published lists: and publish the two
lists that seem to be the most
widely used for this purpose
pattern analysis: nothing sophisticated here;
we look at (i) page views as a
function of time (i.e., clicking a
lot of links with 200 msec on each
page is probative); (ii) the path by
which the 'user' traverses out Site,
is it systematic and complete or
nearly so (like following a
back-tracking algorithm); and (iii)
precisely-timed visits (e.g., 3 am
each day).
Biggest reasons are users have to have JavaScript enabled and load the entire page as the code is often in the footer. Awstars, other serverside solutions like yours will get everything. Plus, analytics does a real good job identifying bots and scrapers.
