I have a service with a large share of requests with an empty value for HTTP_REFERER. I'd like to interpret this correctly and wonder about the most common reasons for that.
I understand that HTTP_REFERER is an optional header field, but most browsers with default setting seem to send them.
Common reasons I have found so far:
proxies
robots
JavaScript links (All of them? Is this browser dependant?)
request from bookmarks or as browser startup page
user entered URL manually
Flash links
link from a different app like email client
browser settings or privacy browser add-ons
some personal firewalls filter referrers
no referrer is sent by most browsers if the redirect happens via semi-official Refresh http header
referrer fakers like this
What's missing|irrelevant|wrong?
Is it possible to put percentages behind these items? Or at maybe sort the list and point out the proportions?
A percentage will depend on what your website is and why people may want to fake their referrer .. Also some people just crack open a new-tab without a homepage. Or land via something other than the browser (such as an addon or chat link, whatever).
If your functionality relies on the referrer use a cookie or rethink the design. Because you can't rely on it.
Basically, all page requests that does not involve user clicking on a link on a webpage.
All depends, and we don't have enough information to say which of the causes is most likely. I'd say robots, but you have to analyse the data (assuming you have server logs) and interpret it. I have no idea how popular your site is or what is its purpose, so robots may not be the number one reason.
In some cases 301 redirects are the cause of losing referral information.
Related
I am creating a Web Widget, a page that customers can use within an HTML Iframe in order to embed our experience on 3rd parties and vendors.
The site will be public, I am not willing to ask consumers to register in order to have a key or a unique identity to be passed as a query param for example (e.g. ?id=<unique_id>).
On the other hand, I need to track who is using the iframe. What are my options? A colleague suggested using the request headers, such as the origin, to track the usage on the server-side. Is that a good strategy? I'm not sure how much I can trust the origin header.
What if I fire an event (hence a client to server call), at page load (such as analytics) which logs the current page URL? Would that work, from within an iframe?
I am pretty sure I am reinventing the wheel here. What would be some good recommendations?
Thanks!
For others ideating for a similar solution, my fix was actually to simply hook a proper client analytics to the page, and trigger a page load event, upon page load, which would push not just the page, but quite a few other properties to our analytics.
Also, we added a clientId query param to our urls, so that we could identify precisely who was serving the iframe visited by the user.
Recently I've been experiencing a large amount of (what I think is) ghost traffic.
I need help in creating a filter to exclude this traffic from my Google Analytics.
URL's are showing up that have other websites appended to them.
Almost all articles I've read mention including only relevant hostnames but this doesn't seem to apply to my situation.
Here you can see the URL's with other random website addresses.(overworlf.com/evite.com/shmoop.com and many others)
Here is a screenshot of the hostnames none of them are out of the ordinary. I suspect this ghost traffic is using my main domain looking at the huge amount of users.
Posted the same question at stackexchange, someone there was able to help me
https://webmasters.stackexchange.com/a/118666/94264
"Almost all the analytics spammers insert data into your stats by pinging the GA tracker directly with fake data. They never visit your site and they usually just guess at the tracking id without knowing website host name associated with it. They won't send a host name, so it wouldn't appear in that report. See How to fight off Google Analytics referrer spammers?
That appears not be the case here. In this case these appear to be actual hits to your website. I tried one of those "top active pages" and it gives a 404 error. It looks like your 404 template has the GA tricking snippet installed on it. I don't think that is best practice. You could try taking the snippet off your 404 page. Then if you did get actual hits to such URLs, GA wouldn't count them as pages."
This can happen when there are search and replace or advanced filters. Are there filters on your view that alter the Request URI?
EDITED AFTER IT WAS CONFIRMED THAT THERE WERE NO FILTERS:
Typically, tracking 404 pages is best practice (referring to your other post).
I don't believe that removing the tracking from that page will help anyway. Like the other poster mentioned, these hits are sent from bots most of the time and they never actually land on your site. The hit is sent directly to your property with an http call. It bypasses the site completely, so whether there is a 404 page or not, the hit will show up in GA.
Adding an exclusion filter to exclude traffic with a page path (not hostname) ending in ".com"
For the security of my website, is there any way I can distinguish between bots and human visitors on my website?
Not really. If a bot WANT to be recognized as a bot, yes you can. Example: search engines bots, like Googlebots.
BUT it's extremely easy for a bot do identify himself as a normal browser; then youre stuck.
If you want a list of bots, here you go: http://www.robotstxt.org/db.html
The only way to do this might be to check for the User-Agent sent in the HTTP request by the current client.
Some bots do not specify any or specifies a specific one such as GoogleBot (Googlebot, Mozilla/5.0) or Baidu Spider.
There is also a list maintained by useragentstring which lists all the known user-agents used by various bots, automated scripts or browsers.
Here's the scenario:
I have a mailing list that contains a PDF download link. The PDF contains ads with clickable links. I need to get analytic data on the link clicks - preferably via Google Analytics (due to the richness of information available).
The solution I have in mind is for the link to go to a web page that I host with some sort of ad-specific token. GA records the request and then I use a client-side technique to redirect to the actual target URL. The redirect page serves no purpose other than to track the click and so I'm not worried about it being perceived as cloaking by search engines.
What I want to know is:
Are there any alternative ways to achieve the tracking without using an intermediate redirect page (could I perhaps call GA server-side somehow)?
If I do use the redirect page approach, what are potential pitfalls could I encounter?
Thanks in advance for any advice.
dunno what server-side environment/language you use but for instance in php you can use cURL to send an image request to google, with the custom code appended to the url. Easiest way to do it is to output the code with javascript with your custom code and then capture the image request url with a sniffer, so you can replicate the format for your cURL request. Make sure to send header info, including fake browser info so GA doesn't weed it out as a bot. Then forward to the ad url. That way you don't need to output a page.
Yeah you still have a 'redirect' happening but you cut out having to have the client download a page or worry about javascript being disabled, etc...
unfortunately there really isn't anything better you can do.
When you sign up to google analytics it instructs you to use a javascript snippet on every page you want to track. This code includes an API key, which is visible to everyone who views your source code.
How does it guarantees that the request is coming from the real site, and not from a third-party who wants to mess with your statistics? Does it check the HTTP Referer header? Even that is not safe, as it can be forged.
GA doesn't (to the best of my knowledge) attempt to verify that the site ID (the UA-XXXXX-XX code) matches a domain specified in the GA setup - I think this is a good thing, as you can track a bunch of related sites as though they were a single site (think single-product minisites, for example). However, this does leave the GA profile open to accidental or malicious use of the UA code on other unrelated sites.
The easiest way to fix this is to add a filter onto the GA profile which restricts reported data to a specified referrer hostname set. This will clean out the accidental typo problem, but malicious types would be able to work around this if they were really interested (but they'd be more likely to grief your PPC campaigns instead).