We use GSA 7.2 and have more than 500k docs in index from large number of subdomains. I am looking for Page where search was performed. GSA is integrated with Google Analytics already. When I look in Search Terms, I see the terms searched on but I can not tell which site from the collection user was on as GA includes only URI ie /search?q=...... I tried looking in Referral too but no success. Any answers?
Thanks.
I see this question is old, but going to answer anyway.
The Google Search Appliance does not track the Referrer (the web page that sent the GET request).
This leaves you with two options to collect that data:
1) Insert a web proxy between your site(s) and the GSA(s). This can have a performance impact of 250-500 ms, so don't use this option if blazing speed is a priority. You would have this proxy log the Referrer and the GET URL, so that you can match that to the reports from the GSA.
2) Rearrange your Collections to reflect the sites that could be sending requests. You can have a max of 200 Collections without impacting performance, so this should work for you unless you have an already complicated arrangement of Collections.
a) An arrangement by Site only could look like this:
- fromIntranetSite
- fromMarketingSite
- fromTokyoSite
- fromHQIntranet
...
b) An arrangement by contents and by Site could look like this:
- FAQsfromIntranetSite
- ProductsFromMarketingSite
- ResourcesFromHQIntranet
...
Related
I currently manage quite a few Google Analytics accounts for different websites and am trying to work out how to remove certain Anayltics spam from these accounts. I have previously added filters like excluding Russia visitors as the businesses are local UK based but I am now getting a lot of traffic from:
Language - not set
&
Page - sharebutton.to
If i was to exlucde the above would that get rid of any actual visitors as well as spam or will it get rid of 100% spam?
If someone could help with this that would be brilliant.
Many Thanks
Paul
Filters based on countries or the name of the spam are not efficient because both can be easily changed by the spammers.
Also, it isn't possible to filter the (not set) entries in Analytics, this label is added after the visit is recorded when Analytics doesn't find a value for that dimension.
Instead what you should use
One hostname filter, this will help prevent the majority of the spam, whether it shows as referral, page, language, etc. and independently of the name used by the spammer.
A source filter for the sneaky crawlers which are far less frequent.
Here you will find detailed instructions on how to create the hostname filter and other measures you can take to prevent fake traffic.
I manage an internal website and we recently implemented campaign tracking for our emails and homepage links to see where traffic comes from.
I set up the URLs using the Google URL builder.
The data we're receiving is very bloated. We ran a test URL with 8 people, and we received 129 "views", with an average of 9 views per day for over a month. No one clicked this link after the first day.
Our average session times were about 30 minutes, which is very strange.
My questions are:
how does google track campaigns? If you use a tracking URL, does the cookie track views for any organic views after that?
Is there a tool we can use to only track first time visits using a campaign URL?
Admittedly, I'm fairly new to Google Analytics, but no one on our marketing analytics team was able to help.
Since you used the Google URL builder I don't think you have made any mistakes there. However I strongly think that the bloated data is due to Bot traffic in your account. And yes, the bot traffic does increase average session duration.
So here's a set of steps I'll suggest:
1) Create 3 views in Google Analytics (It is a best practice):
Unfiltered, Master, Test
2) Check for Langauage spam and weird referrals in your report.
3) Add filters to "Test" view to remove these bots & spam referrals. You'll need to write a regular expression for each of these filters. Also make sure you have enabled "bot filtering" in view settings for master & test view. (I am leaving Unfiltered view as it is our data backup in case if anything goes wrong.)
4) Check your traffic for next few days and try doing the URL test again and see the results.
5) If the results in Test View are correct, then apply the same filters to "Master" view.
I hope this helps.
I would like to create a small sidebar on each page of my website that contains related/popular pages with perhaps the top five pages users visit after reading the current page.
I could track and record user movements across the site myself and build the list that way, but as my site already uses Google Analytics and I know the data is there I'd rather access that if all possible.
The trouble is that I don't have the faintest idea whether it is possible or not.
Remember that the Google Analytics Reporting API is not real-time it can take between 24 - 48 hours for the data to finish processing and be in the API for you to request.
The Realtime Google Analytics api is real time but the data is only about 5 minutes old and its very limited to the dimensions and metrics you can request.
Quota, with either of those APIs you are limited to 10,000 requests per day per profile / view. I have no idea how many pages there are on your site or how may users are on your site but this could quickly blow out this NON extendable quota.
Options: Except that its not realtime data and use the reporting api every night run a request against the api get everything for two days ago then show your users on your site data that's two days old. Store the data in the database then you are showing them data on in your DB and wont have an issue with the quota as you only requested it once.
But this isn't exactly what you want as its not showing a users activity over the site. TBH I am not sure you can exactly use Google Analytics to track a user as the data is user non specific.
If you don't want to get involved with learning the API and develop this from the ground up, check out EmbeddedAnalytics (disclaimer: I created the service). We could provide such a widget.
You may find This Article useful. It provides the necessary query to find the "next page visited" using the page of interest as a filter. Ultimately your query would look like this:
https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3Aabc&start-date=30daysAgo&end-date=yesterday&metrics=ga%3Apageviews&dimensions=ga%3ApreviousPagePath%2Cga%3AnextPagePath&sort=-ga%3Apageviews&filters=ga%3ApreviousPagePath%3D%40pricing
The query above will give you the "Next Page" along with pageviews assuming the "previous" page contains the word "pricing".
We could easily build such report widget for you:
You would insert a javascript source code snippet into your page. The javascript would pass the page url to our server and we would return the next "most popular pages visited".
The pages could be "linkified" so that someone could click the link to go to that page.
We already have caching mechanism in place. So each pageview would not require a new query to google (making it quicker and also staying away from the API quota that was mentioned above). For pages that are hardly ever looked at (e.g. less than once a week), we could make "on-demand" calls to get the statistics.
In my experience with the API, the lag in the API is only a couple hours. It may be longer for larger sites.
Please let me know if you are interested in such widget and I can work with you.
Say I have an article which has been viewed 100 times and has an Average Visit Duration of 01:00:00 hrs. Is there any way I can break down those statistics - and see how long each individual visit lasted for?
(I should state that I'm not looking to find out information about particular IP addresses or anything like that. I just want to get some idea of the 'mode visit' - the time most people spent on the page.)
Google Analytics doesn't provide enough detailed insights for invividual visitor details. If you want a more granular data try CardioLog Analytics
Yes, right, Google doesn't provide that. I tend to use sitemeter in conjuction with Google. Not sure if I recommend sitemeter though. It does give specifics about individual visitors, but they are very flaky. I don't think I've ever gotten a response from their so-called "tech support" or anything else from them.
The short answer is no, you can't. Google Analytics doesn't provide individual visitor details as it violates the GA Terms of Service.
However there are a couple ways to get at or close to this information:
1) Create an advanced segment - use the "Page" dimension and include the URI of the article on your site. Apply it and then look at the city or service provider report - it will show you all visits that viewed the article.
2) Keep a copy of the tracking data sent to Google and process it with on premises web analytics software that doesn't have the same ToS/privacy restrictions.
After being stumped by an earlier quesiton: SO google-analytics-domain-data-without-filtering
I've been experimenting with a very basic analytics system of my own.
MySQL table:
hit_id, subsite_id, timestamp, ip, url
The subsite_id let's me drill down to a folder (as explained in the previous question).
I can now get the following metrics:
Page Views - Grouped by subsite_id and date
Unique Page Views - Grouped by subsite_id, date, url, IP (not nesecarily how Google does it!)
The usual "most visited page", "likely time to visit" etc etc.
I've now compared my data to that in Google Analytics and found that Google has lower values each metric. Ie, my own setup is counting more hits than Google.
So I've started discounting IP's from various web crawlers, Google, Yahoo & Dotbot so far.
Short Questions:
Is it worth me collating a list of
all major crawlers to discount, is
any list likely to change regularly?
Are there any other obvious filters
that Google will be applying to GA
data?
What other data would you
collect that might be of use further
down the line?
What variables does
Google use to work out entrance
search keywords to a site?
The data is only going to used internally for our own "subsite ranking system", but I would like to show my users some basic data (page views, most popular pages etc) for their reference.
Lots of people block Google Analytics for privacy reasons.
Under-reporting by the client-side rig versus server-side eems to be the usual outcome of these comparisons.
Here's how i've tried to reconcile the disparity when i've come across these studies:
Data Sources recorded in server-side collection but not client-side:
hits from
mobile devices that don't support javascript (this is probably a
significant source of disparity
between the two collection
techniques--e.g., Jan 07 comScore
study showed that 19% of UK
Internet Users access the Internet
from a mobile device)
hits from spiders, bots (which you
mentioned already)
Data Sources/Events that server-side collection tends to record with greater fidelity (much less false negatives) compared with javascript page tags:
hits from users behind firewalls,
particularly corporate
firewalls--firewalls block page tag,
plus some are configured to
reject/delete cookies.
hits from users who have disabled
javascript in their browsers--five
percent, according to the W3C
Data
hits from users who exit the page
before it loads. Again, this is a
larger source of disparity than you
might think. The most
frequently-cited study to
support this was conducted by Stone
Temple Consulting, which showed that
the difference in unique visitor
traffic between two identical sites
configured with the same web
analytics system, but which differed
only in that the js tracking code was
placed at the bottom of the pages
in one site, and at the top of
the pages in the other--was 4.3%
FWIW, here's the scheme i use to remove/identify spiders, bots, etc.:
monitor requests for our
robots.txt file: then of course filter all other requests from same
IP address + user agent (not all
spiders will request robots.txt of
course, but with miniscule error,
any request for this resource is
probably a bot.
compare user agent and ip addresses
against published lists: iab.net and
user-agents.org publish the two
lists that seem to be the most
widely used for this purpose
pattern analysis: nothing sophisticated here;
we look at (i) page views as a
function of time (i.e., clicking a
lot of links with 200 msec on each
page is probative); (ii) the path by
which the 'user' traverses out Site,
is it systematic and complete or
nearly so (like following a
back-tracking algorithm); and (iii)
precisely-timed visits (e.g., 3 am
each day).
Biggest reasons are users have to have JavaScript enabled and load the entire page as the code is often in the footer. Awstars, other serverside solutions like yours will get everything. Plus, analytics does a real good job identifying bots and scrapers.