GoogleTag Manager transforming root URL - iframe

A client of ours currently uses an error reporting service which logs a number of errors thrown across the site and alerts us when a certain threshold is hit. This morning we received a large number of errors to do with the GoogleTag Manager service, however I'm unsure how or why these 404's are occurring and was wondering if anyone had seen similar behavior before.
What appears to be happening is that the GoogleTag iframe URL is being appended to the sites root URL and is thus causing a 404.
To illustrate this more clearly, we have the following iframe code
<iframe src="//www.googletagmanager.com/ns.html?id=GTM-KJHJ"
and we are seeing the following URL returned as a 404
www.website.com//www.googletagmanager.com/ns.html?id=GTM-KJHJ
Has anyone had an experience of this happening? Or would be able to indicate or think of any reason why this would have started to occur suddenly?
As a side note, this has been happening repeatedly from various IPs for the last few hours.
Thanks.

Scheme-relative urls work fine in all common browsers, so I think that these errors might be caused by some crawlers that do not parse such urls correctly. Have you checked the IP addresses in your logs to see where they come from?
Independent of that - if you need to get rid of these 404 errors then adding the protocol should do the trick. So change the GTM code such that both urls start with the protocol (http:// or https://) and the 404 errors should go away.

Related

404 error on my homepage although I can see my site

I am at my wits end with the following problem:
My site www.sebastianthalhammer.com is available under that URL without any problems.
However Google Search Console as well as other external third party test tools return a 404 error.
Status report from Uptrends
It is just the main page that's affected. All the other subpages and blog content isn't affected.
I have been in contact with the server stuff but it seems alright to them. As mentioned. The site can be reached. The site runs on wordpress - latest version.
I have no real clue where to start as this error seems to be quite a tricky one. Does anyone here might have an idea what's going on?
Sebastian
The 4xx class of status code is intended for cases in which the
client seems to have erred. Except when responding to a HEAD
request, the server SHOULD include a representation containing an
explanation of the error situation, and whether it is a temporary or
permanent condition. These status codes are applicable to any
request method. User agents SHOULD display any included
representation to the user.
This leaves me with two possible explanations:
Explanation 1: it's a server error.
the server wrongly returns a 404 status code
the browser thinks the response body contains details about the error and displays it - for the end user this is the actual page
Explanation 2: it's done on purpose to defeat crawlers and page watchers.
the server returns 404 on purpose - non-browser user agents won't process the result as they interpret it as error
browsers are unaffected, the end user doesn't care as long as the page is being displayed
The second one would indeed be kind of clever if you don't want your page to be indexed.
Thanks to your feedback I could think about the problem in a different way.
Ultimately at the unholy depths of a certain plugin I could dig out a setting that caused the error.
It was a redirection plugin that (for whatever reason) sent out a 404 signal when the URL was requested.
I don't know what the purpose would be for something. All I know is that the setting was on default for quite a while now and that caused the weird situation.
thanks guys for getting me on the right track.
Sebastian

Server Log Showing Many 'Unhandled Exceptions' From URL with &hash=

I've noticed a large increase in the number of events logged daily that have &hash= in the URL. The requested URL is the same every time but the number that follows the &hash= is always different.
I have no idea what the purpose of the &hash= parameter is, so I'm unsure if these attempts are malicious or something else. Can anyone provide insight as to what is being attempted with the requested URL? I have copied in one from a recent log below.
https://www.movinglabor.com:443/moving-services/moving-labor/move-furniture/&du=https:/www.movinglabor.com/moving-services/moving-labor/move.../&hash=AFD3C9508211E3F234B4A265B3EF7E3F
I have been seeing the same thing in IIS on Windows Server 2012 R2. They were mostly HEAD requests. I did see a few other more obvious attack attempts from the same ip address so I'm assuming the du/hash thing is also intended to be malicious.
Here's an example of another attempt which also tries some url encoding to bypass filters:
part_id=D8DD67F9S8DF79S8D7F9D9D%5C&du=https://www.examplesite.com/page..asp%5C?part...%5C&hash=DA54E35B7D77F7137E|-|0|404_Not_Found
So you may want to look through your IIS logs to see if they are trying other things.
In the end I simply created a blocking rule for it using the Url Rewrite extension for IIS.

Are these requests to www.mydomain.com/css/about:blank a broken link or something else?

We recently implemented a new error page on our website which sends emails to the webmaster containing the most recent server exception. We are running a ASP.NET 4 application, and last night we got many emails that were all the same error:
A potentially dangerous Request.Path value was detected from the client (:).
These errors we have seen before, but the odd thing is the path that is being requested. It is always the path:
http://www.mydomain.com/css/about:blank
I have scoured the different pages and can find no anchor tag that appears to point to any link like this. Is this an issue with our application or something else? In other words, do we need to fix anything or just ignore these?
Also, this path was requested consistently, seemingly by the same users, and often was requested from multiple pages they visited. User-agents ranged from Firefox to IE7 and 8.
Have you done anything like this in your css: background-image:url(about:blank);
This shouldn't generate a http request however so I suspect you might have maybe a ./about:blank in there instead.

using customErrors for vanity URLs / asp.net url redirection

So, from here...
In ASP.NET, you have a choice about how to respond to that - it's in the web.config as CustomErrors. Turn that on, then redirect to a fancy 404 page (maybe you already do). The fancy 404 page, then, could be checking the requested querystring (which gets passed over to the custom error page as yet another querystring) to see if it's a valid redirect, lives in your database, etc. Just do a Response.Redirect() from there.
Then schooner writes:
Thanks, we do have a 404 now but we would prefer this not to be detected as a 404 in the process. We would like ot handle it directly and seperately if possible.
..and I'd like to know just how bad a practice this is. I don't expect to put my "pretty" URLs on the internet (just business cards) and I have a sample of 404-redirecting-to-a-helpful-site code working, but I don't want to get to production and have an issue with a browser that takes the initial 404 too seriously. Can anyone help me understand more about why I wouldn't want to use customErrors / 404 to flow users to the page they actually wanted?
The main problem with using customeErrors as your 404 error handler is that every time customErrors picks up an errored request rather than throwing a 404 error back to your browser and letting your browser know there was a bad request, it instead returns a 302 which indicates that a page has been relocated to whatever your customErrors page is. This isn't bad for most users because they don't know or even notice the difference, the problem comes from the fact that web crawlers DO know the difference and the status code they receive directly affects how their indexing works.
Consider the scenario where you have a page at http://mysite.com/MyAwesomePageAboutStuff.aspx for some period of time and then one day you decide you no longer need it and delete the file. If Google or some other crawler has already indexed that URL and goes back to it after you delete it the crawler will get a 302 status code instead of a 404 error and because of this status code the crawler will update the page's url to point to your error page rather deleting the non-existent link. Now, whenever someone finds that url by way of a search engine they'll end up at your error page.
It's not really a huge issue, but you can definitely see the headaches this can create for your users in the long run.
Look here for some corroborating data.
I created a vanity url system using the 404 as the handler. There's no need for a 302 on my side as the 404 dynamically loads the content and returns that. I am fully able to handle any and all POST / GET and SERVER data.
Works great. If you are interested TarantulaHawk is up on SourceForge.

WebResource.axd requested without parameters - This is an invalid webresource request

I'm finding this problem every now and then in my production website, and it has me absolutely stumped...
My app works perfectly in both dev and production, but every now and then, I get an e-mail from my global error handling with this:
MESSAGE: This is an invalid webresource request.
URL: /WebResource.axd
(which means that for some reason webresource.axd was requested without specifying any GET parameters)
I'm not doing anything with webresource.axd myself, I don't get any of my resources through it, it's only used automatically by .Net to serve it's typical JS for validators, etc.
Any idea why this might be getting requested without parameters?
Has anyone encountered this?
That definitely is a bot not doing very good job of crawling your web site. It processes your web form and locates reference to WebResource.axd, for example:
<script src="/site/WebResource.axd?d=MtIW_TBRtZCvAXDMJGwg4g2&t=633772897740666651" type="text/javascript"></script>
The bot expects static JavaScript files only and tries to download it by requesting WebResource.axd without parameters. The result is an exception thrown by System.Web.Handlers.AssemblyResourceLoader class and intercepted by Application_Error in Global.asax.
I believe this exception is harmless - the client will receive 404 error. You can safely ignore it.
We also have all of our errors emailed to us, and we occasionally get those too. They never seem to have a referrer, and the user agent is usually a little wacky. We write them off as bots.
I just checked a couple of the offending client IP's against Arin, and one them belonged to a web-spidering-type organization, so there's a little more evidence for the bot theory.
I would also log the useragent that made the request to WebResource.axd. It wouldn't surprise me if it was a bot crawling your site.
This discussion...
http://www.telerik.com/community/forums/aspnet/spell/this-is-an-invalid-webresource-request.aspx
... and this linked MSDN article...
http://msdn.microsoft.com/en-us/magazine/cc163708.aspx
... might shed a little light (though not much).

Resources