How does Google Analytics track URL with trailing slash and without? - google-analytics

Saw in my analytics that the two URLs are tracked separately, one with trailing / and one without /. Is this the case of user inputting the URL without / which is already tracked upon pageload?

Yes, this may be the most plausible reason. This happens if the page can be both with or without trailing slashes.
You can fix it with a filter in Google Analytics view that rewrites the URL if it does not contain the trailing slashes.

Related

Wordpress urls to Xenforo urls - Redirect via htaccess

I need to redirect all my old urls to xenforo after a wp migration, i have do it but is not working.
When it redirecting the old urls to my new xenforo it causes an error of "not found" because all the urls of xenforo have numbers or ids at the end of the urls.
wp old urls:
https://www.example.com/posts/my-new-urls-for-me/
xenforo new urls:
https://www.example.com/threads/my-new-urls-for-me.1824/
I need some help to correctly redirect without errors.
XenForo URLs rely on the numeric ID at the end of the URL.
So in your example, https://www.example.com/threads/my-new-urls-for-me.1824/ ... it is the 1824 which is significant.
The my-new-urls-for-me part is irrelevant, you can literally have https://www.example.com/threads/any-thing-you-like-in-this-part.1824/ and it will still redirect to the same thread, so long as the numeric part at the end is in tact.
More specifically, you can also do https://www.example.com/threads/1824/ with just the numeric part and it will still work.
What you cannot do is rely on the text part - not without custom coding.
To redirect from a WordPress URL to a XenForo URL, you need to know:
the original WordPress slug (the my-new-urls-for-me part)
the corresponding XenForo thread id that it was moved to (the 1824 part)
Then redirection becomes a simple matter of matching that slug and then redirecting to a thread with that id.
If you didn't do that at the time of migration, you'll need to do it manually now - generate a table of slugs and thread ids and use that as a lookup by your redirection script.
Unless there are no more than a few dozen pages that were migrated, I suggest not trying to do it with .htaccess alone, but instead using a database table and write a simple script which does the redirection by matching the incoming slug, locating that entry in the table, retrieving the corresponding thread ID in XenForo and then redirecting to the thread with that ID.

Is an HTTP URL containing // valid?

I have a URL in the following form
https://abc.domain.com//xyz/test
and I'm fetching the contents via Volley (Android).
Is it a valid URL? Do I need to one / remove from //xyz
Basically it's still a valid URL.
Both server and search engine consider these two (with / or //) are separated URLs.
So be sure what URL is correct for fetching.

Getting unexpected data while setting up Analytics Service [duplicate]

I have several websites that get daily around 5% of visits from spam referrers. There is one strange things I noticed about this referrers: they show in Google Analytics, but I cannot see them in my custom designed table where I insert all the visitors to the site, so I think that they only manipulate the GA code, never reaching the site itself.
If you follow their link, they redirect you to some affiliates link.
I don't know whether they have impact on my SEO/SERP, but I would like to get rid of them. May I do that via htaccess file?
One peculiar aspect is that I get visitors from different forum like pages. E.g.: forum.topic221122.darodar.com, forum.topic125512.darodar.com etc., so I would like to block the full darodar.com domain.
Besides darodar.com, there are also econom.co and iloveitaly.co that are bothering my stats. Can I block them all from htaccess?
Most of the Spam in Google Analytics never access your site so you can't block them using any server-side solution.
Ghost Spam hits directly GA and usually shows up only for a few days and then disappear, that's why some people think they blocked them from the .htaccess file but is just coincidence.
This type of Spam is easy to spot since they use either a fake hostname or is not set. (See image below)
The other type, Crawlers like semalt, actually access your site and can be blocked from the .htaccess file, however, there are just a few of them.
So in summary, to stop spam in Google Analytics:
Crawlers: server-side solutions or filters in GA
Ghosts: ONLY filters in GA
The only efficient solution to prevent being hit by ghost spam is by making an include filter with all your valid hostnames.
First you need to make a REGEX with all the valid hostnames, something like this (you can find them on the network report)
yoursite\.com|shoppingcart\.com|translateservice\.net
These are some examples; you might have more or fewer hostnames. Once you have the REGEX, follow the same steps as above and change this:
Go to the admin tab in Google Analytics
Select FILTER under the View Column > New Filter
Filter type Custom > Include > Filter Field Hostname
File Pattern Copy the hostname expression you built
For Crawlers you will have to create a different filter building an expression with all spammers
spammer1|spammer2|spammer3|spammer4|spammer5
Filter type Custom > Exclude > Filter Field Campaign source
File Pattern Copy the referral expression
Everytime you work with filters it is important that you keep an unfiltered view.
If you need detailed steps for this solutions you can check this complete guide about Spam in Google Analytics.
Guide to stop and remove All the spam in Google Analytics
Hope it helps.
Hostname report Example
This blog post suggests that the spam referrers manipulate Google Analytics and never actually visit your site, so blocking them is pointless. Google Analytics offers filtering if you want to mitigate fake site hits.
Yes you can block with .htaccess and actually you should do it.
Your .htaccess file could look like this:
<IfModule mod_setenvif.c>
# Set spammers referral as spambot
SetEnvIfNoCase Referer darodar.com spambot=yes
SetEnvIfNoCase Referer 7makemoneyonline.com spambot=yes
## add as many as you find
Order allow,deny
Allow from all
Deny from env=spambot
</IfModule>
When traffic comes from these sites, they are blocked with this .htaccess, so the HTML is never loaded and therefore GA script is not fired up (from these sites).
They try to collect traffic from you, once you see the incoming traffic in Google Analytics then trying to find out what is the source you go to that URL. It is harmless to your site, except your statistics are full of junk data.
Google Analytics should prevent this, the same way GMail prevents spam email.
According to this entry, they are never visiting your site, they are faking HTTP request to GA using your UA-code. So, it seems it's pointless to block them using .htaccess or any other method, because they never actually enter to your site, they are only sending fake "visit" data to Google.
We have found that using htaccess is a good way to stop these spams. I have implemented below solution on my clients site which is working really well so far.
Best way is to stop them by contains clause, e.g. spam priceg.com check for priceg in referrer url.
Because many of these sites are creating sub domains and re hitting and when they tweak the url, hard coded conditions fail
RewriteCond %{HTTP_REFERER} (priceg) [NC,OR]
RewriteCond %{HTTP_REFERER} (darodar) [NC,OR]
It is explained in detail here
apparently, this is done by a spammer by communicating directly with google analytics using your website's account ID. So they effectively tell google analytics they visited your page while in fact they never did. They identify themselves to analytics by means of an URL which THEY WANT YOU TO VISIT. So you see their traffic in google analytics and go check them out. They will have an amazon affiliate account hooked up and so they attempt to get a commission from your amazon purchases, for example.
so .htaccess did nothing for me when I was fighting this one; you need to create a filter which filters out things like (.*)/.darodar/.com
the real bad effect I have found from this is it invalidates my website statistics
You can restrict access use .htaccess or by filtering ALL robot visits from being tracked by Google Analytics. If that doesn't work, setup Google Analytics filtering. More details on how to do that can be found here: http://www.wiyre.com/google-analytics-darodar-forum-spam-what-is-it/
They are Russian based but routing their spiders through China and the Philippines. Maybe it would be best to block the whole IP address at this point, they have multiple sub-domains.
Blocking any bots at your web server level makes no sense - spammers are sending fake requests to Google Analytics web server. All they have to know is website domain name and Google Analytics ID linked to it.
So you have to mask your Google Analytics ID at website code. For example, you can do like this at Google Analytics JS code:
ga('create', 'UA-X' + 'XXXXX' + 'XX-X', 'auto');
Spammer's bot should be able to execute JS code to parse your Google Analytics ID after this change (and not so many bots will be able to do it).
https://nobodyonsecurity.com/security/fighting-google-analytics-referrer-spam
.htaccess is not the best way. In my site I use GA, The option tracking information and then Reference exclusion list.
Regards!
Lunametrics posted a nice article to solve this issue using Google Tag Manager:
http://www.lunametrics.com/blog/2014/03/11/goodbye-to-exclude-filters-google-analytics/
I think that the most effective way to avoid ghost spam is to add a custom dimension that let you know the site was indeed visited, because as we know they never visit the site.
ga('set', 'dimension1', "Hey I'm really here!!");
ga('send', 'pageview');
You should simply add this lines in your pages and then add a filter to "include" only when the dimension has the expected value ("Hey I'm really here!!") in this case
I used these mod_rewrite methods for semalt:
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?semalt\.com.*$ [NC]
RewriteCond %{HTTP_REFERER} ^http(s)?://(.*\.)?semalt\.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*semalt\.com\ [NC,OR]
or with the .htaccess module mod_setenvif
SetEnvIfNoCase Referer semalt.com spambot=yes
SetEnvIfNoCase REMOTE_ADDR "217\.23\.11\.15" spambot=yes
SetEnvIfNoCase REMOTE_ADDR "217\.23\.7\.144" spambot=yes
Order allow,deny
Allow from all
Deny from env=spambot
I even created an Apache, Nginx & Varnish blacklist plus Google Analytics segment to prevent referrer spam traffic, you can find it here:
https://github.com/Stevie-Ray/referrer-spam-blocker/
Filter future and historical ga spam of all types with the link provided. Hostname filtering is particularly easy.
https://www.ohow.co/ultimate-guide-to-removing-irrelevant-traffic-in-google-analytics/
2019 update
I may have a solution to this problem as I find none of the other solutions to be effective.
Let me address the problems of the existing solutions first
Add a filter for each referrer spam domain.
How many domains will you add?
Most of these referrer spam domains exist for sometime and
then disappear
Maintain a blacklist of referrer spam domains.
This gets even more complicated as they are basically endless in numbers.
You would have to keep updating the blacklist.
Also bigger the blacklist, the more time you need to scan it
Anything else such as maintaining a manual htaccess or something will require manual intervention which will not scale as your site becomes more popular
Anything automatic such as using AI to determine patterns in how referrer spam domains appear will have a hit/miss thing
How do these bots work?
First, it is crucial to understand how these bots work
They use regex patterns at the least such as /UA-\d{6}/ to load tracking ids which they visit recursively after starting at a seed website
I believe I have a solution that offers the following advantages
No need to maintain whitelists and blacklist
Will work against 99% of them easily and can always be modified to take it to 100%
Requires almost NO manual intervention
The idea is to NOT have a tracking ID at all in the script
Here is an example
script.
//- Google Analytics ID
var a = [85, 65, 45, 49, 49, 49, 49, 49, 49, 49, 49, 49, 45, 50];
var newScript = document.createElement("script");
newScript.type = "text/javascript";
newScript.setAttribute("async", "true");
newScript.setAttribute("src", "https://www.googletagmanager.com/gtag/js?id=" + a.map(i => String.fromCharCode(i)).join(""));
document.documentElement.firstChild.appendChild(newScript);
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', a.map(i => String.fromCharCode(i)).join(""), { 'send_page_view': false });
// Feature detects Navigation Timing API support.
if (window.performance) {
// Gets the number of milliseconds since page load
// (and rounds the result since the value must be an integer).
var timeSincePageLoad = Math.round(performance.now());
console.log(timeSincePageLoad)
// Sends the timing event to Google Analytics.
gtag('event', 'timing_complete', {
'name': 'load',
'value': timeSincePageLoad,
'event_category': '#{title}'
});
}
We take a very simple approach, break the tracking ID of the form 'UA-1111111-1' into a char code array
Now we construct the tracking ID dynamically from the char code array at any point we need a reference to the tracking ID
The approach can be made infinitely more complex by turning it into encrypted bunch of numbers, base 8 , hexadecimal, adding a fixed offset, a random offset during each run, RSA encrypting the tracking ID with a private key on the server and decrypting it with a public key but the basic approach is REALLY fast, as arrays in JS are really fast, can easily beat 99% of the bots

IIS Rewrite URL To Subfolder

Hi I am trying to match my Request URL and redirect to subfolder from Defauls Website.
My Default Website is crm.domainname.com. If someone try with this request ,it should redirect to crm.domainname.com/subfolder.
I tried this:
But it never redirect my request.
UPDATE
I make change and match with regular expression now.
It Works for 4-5 requests than again it stop redirecting.
The 'Pattern' field is meant to filled with a regular expression. Check this link for configuration options.

Check malicious Redirect URL in ASP.NET

I heard of sites using other site to redirect users either to their own site or to hide behind another site. In my code i redirect in a few places such as post a comment (its easier to use a return url then figure out the page using data given).
How do i check if the return URL is my own url? I think i use absolute paths so i can easily check if the first character is '/' but then i will lose relative flexibility. This also disallows me from doing http://mysite.com/blah in the redirect url. I could patch the url by adding mysite + string but i'll need to figure out if string is a relative url or already a mysite.com url.
Whats the easiest way to ensure i am only redirecting to my site?
How about, if the redirectUrl contains "://" (which includes http://, https://, ftp://, etc.) then it must also start with "http://mysite.com". If it does not contain "://" then it is relative and should not be a problem. Something like this:
if (!(redirectUrl.Contains("://") ^ redirectUrl.IndexOf("http://mysite.com") == 0))
{
Response.Redirect(redirectUrl);
}
I hadn't thought of this before, but how about using an encrypted version of the URL in the query string parameter?
Alternatively, you could keep a list of the actual URLs in some persistent store (persistent for a couple of hours, maybe), and in the query string, just include the index into the persistent store of URLs. Since You'd be the only code manipulating this persistent, server-side store, the worst a malicious user could do would be to redirect to a different valid URL.
This seems to be an odd question, and it should not be a concern if you are in full control over the redirect process. If for some reason you are allowing input from the user to be actively involved in a redirect (as in the code below)
Response.Redirect(someUserInput);
Then, yes, a user could have your code send them off to who knows where. But if all you are ever doing is
Response.Redirect("/somepage.aspx")
Then those redirects will always be on your site.
Like I said, it seems to be an odd question. The more prominent concerns in terms of user input are typically SQL Injection attacks and cross-site scripting. I've not really heard about "malicious redirects."

Resources