Blocking referral spam traffic in asp.net without modifying the web.config - asp.net

I'm using Google Analytics and I'm using filters to remove referral spams. In my web.config file, I also use this:
<rule name="buy-cheap-online.info" patternSyntax="Wildcard" stopProcessing="true">
<match url="*" />
<conditions>
<add input="{HTTP_REFERER}" pattern="*.buy-cheap-online.info" />
</conditions>
<action type="AbortRequest" />
</rule>
I have dozens of these rules and I want to add more. There's this file on GitHub that includes a list of spammers: https://github.com/piwik/referrer-spam-blacklist/blob/master/spammers.txt
I could just keep adding rules to the web.config but it seems messy. What's another way to block referral spam traffic in asp.net so that all the sites in the text file can be blocked and if the file changes I can easily add new sites by reuploading the text file?
Note: I'm not asking for code to be written for me; I just want to know what other options I have.

That's right keep adding rules will be messy and even worse useless, most of the spam in GA never reaches your site, there is no interaction at all so any server-side solution like the web.config won't have any effect.
We can differentiate the spam mainly in 2 categories:
Ghost Spam that never interacts with your page, so any server-side solutions like the web.config or htaccess file won't have any effect and will only fill your config files with.
Some people still hesitate because they think creating filters is just hiding the problem instead of blocking. But there is nothing to block, it is just some guy making fake records on GA reports.
And Crawler Spam as the name imply, they do access your website and can be blocked this way, but there are only a few of them compared with the ghost.
To give you an idea there are around 8 active crawlers while there are more than 100 ghosts and each week increasing. This is because the ghost method is easier to implement for the spammers.
The best way to get rid of all ghosts with just one filter based in your valid hostnames.
You can find more information about the ghost spam and the solution here
https://stackoverflow.com/a/28354319/3197362
https://moz.com/ugc/stop-ghost-spam-in-google-analytics-with-one-filter
Hope it helps.

Related

Default page returns 404 only when using search bot user agent

I have created a website using ASP.NET web pages (not MVC, not web forms).
If I access the default page by mydomain.com in a browser it shows the default page (index.cshtml) fine. However, search engines are seeing a 404 page and if I change my user agent to Googlebot or Bingbot I get a 404 error too.
This only affects the default page - if I use mydomain.com/index.cshtml I don't get the 404 page.
There is no user agent detection in my code.
I have watched the headers and there is no redirections, just an immediate 404 response only when using a bot user agent.
Is there some built-in user agent detection that affects default pages in ASP.NET web pages? Or could my hosting company be doing something (Arvixe)?
I can add code if it helps (but not sure what code I would add), or link to the web site.
I found the cause of the problem.
Apparently Arvixe websites have been hacked. The hackers inserted some code in web.config that displayed a different URL in place of the home page for bots only...
<rewrite>
<rules>
<rule name="1" patternSyntax="ECMAScript" stopProcessing="true">
<match url="^$" ignoreCase="true" negate="false" />
<conditions logicalGrouping="MatchAny" trackAllCaptures="false">
<add input="{HTTP_USER_AGENT}" pattern="Googlebot|Yahoo|MSNBot|bingbot" />
</conditions>
<action type="Rewrite" url="bot.asp" />
</rule>
</rules>
</rewrite>
I did see title/description for sports jerseys in Bing search for my website which is why I was investigating this.
From a search it appears this has affected lots of Arvixe customers, most will probably never know as they are unlikely to see their website with a search bot user agent.
It looks like Arvixe were aware of the hacking and have already put a stop to this by removing the spam file (bot.asp or bot.php) but they have not fixed the web.config. If you have shared hosting with Arvixe you should check for this now.
You should also check your Google search console/analytics accounts for owners/users added as some have reported this too, although you would have got an email warning of this.
I changed all my Arvixe passwords but I doubt they got individual account passwords, they probably hacked at the server level.

URL masking? maybe?

Have a .net webapp running on IIS.
I have run across something I haven't had to really deal with before. we have partners or clients that have their own "pages" on our domain. currently the URL is www.mydomain.com/?code=partnercode, but for ease of use on business cards and such they want it to be more like www.mydomain.com/partnername and I am not sure how to do this.
I know we can do something like the following in the htaccess in Apache
RewriteRule ^partnername$ index.php?code=partnerid [L]
I am wondering if there is some way to do this in the web.config? there has got to be something, but I am unsure where to look to find it. I have tried those online htaccess to web.config code converters and it failed miserably. The other thing is I would prefer to not have to change the partnerID that we already have in the DB.
I found this on another question on this site but I don't think it will do what I need it to. It will change the URL in the browser one the user hits the page but I also want it to be accessible using the www.mydomain.com/partnername URL as well.
if ( $_SERVER['REQUEST_URI'] == '/index.php?code=partnerid') {
echo '<script type="text/javascript">window.history.pushState("", "", "/partnername");</script>';
}
IIS has extension that partially supports your scenario, it is called URL Rewrite.
I said partially, because you can use it to rewrite URLs from www.mydomain.com/partnername to www.mydomain.com/?code=partnername. What it doesn't support (at least I don't think it does) is mapping a partner name to partner code (unless you have small number of partners and you can add rewrite rule for every partner).
And here is an article showing a fraction of what you can do with URL Rewrite.
In your case if you want to rewrite www.mydomain.com/partnername to www.mydomain.com/?code=partnername, your rule configuration could look something like this (not tested on IIS):
<rewrite>
<rules>
<rule name="Rewrite to query string">
<!-- I'm using hardcoded text, but it is regular expression and you can
write very advanced conditions -->
<match url="^(partner)" />
<!-- changes incoming url /partner to ?code=partner -->
<action type="Rewrite" url="?code={R:1}" />
</rule>
</rules>
</rewrite>

IIS 7 URL Rewrite for 404 and Sitefinity

We have a new Sitefinity site that is replacing our marketing site. The switchover happened last friday, and we uncovered a problem today: there is content (pdfs, jpgs) on the old site that can no longer be accessed, and did not make it into the content migration plan. On top of that, management has removed rollback as an option.
So, the solution I have come up with is to use IIS 7's url rewriting module to point to a new url that hosts the old site so that content can be accessed. This is the xml in my web.config that I have come up with:
<rewrite>
<rules>
<rule name="RedirectFileNotFound" stopProcessing="true">
<match url=".*" />
<conditions logicalGrouping="MatchAll">
<add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
<add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
<add input="{URL}" negate="false" pattern="/\.*$" />
</conditions>
<action type="Redirect" url="http://www.oldsite.com{REQUEST_URI}" appendQueryString="true" />
</rule>
</rules>
</rewrite>
It attempts to test if the URL resolves to a file or folder, and makes sure that we are requesting something with an extension. If the rules pass, it redirects to the same location on the old site. Ideally, this would mean that anything linking to the old site previously would be able to be left alone.
The problem is, nothing gets redirected.
By fiddling with the rules, I have verified that the module is operational, i.e. i can set it up to rewrite everything, and it works. but these rules do not work.
My theory is that since Sitefinity uses database storage, it somehow short circuits the "IsFile" match type. Complete guess, but I'm kind of at a loss at this point.
How to I use urlrewriting to redirect for 404's in this manner?
I am not sure how the rewriter is implemented, but those rules seem to be too general. Sitefinity uses the routing engine and registers a series of routes that it handles. By definition, those routes are interpreted sequentially, so if a more general rule exists before a more specific one, the latter will not work.
I suspect what may be happening is that the Sitefinity rules already handle the request before the rewriter gets a chance to redirect it. What I can advise is to either implement more specific rewrite/redirect rules, or just handle the whole issue using a different approach. What was the reason your old files were inaccessible after the migration? Can you give a specific URL that fails to return the file, so we can work with a scenario?
this is just a shot in the dark, but do you have "file system fallback" enabled in the sitefinity advanced settings for libraries? perhaps the module is intercepting the request and not letting it proceeed to the file-system...
Thank you guys for your help, but it turned out to be a problem with Dynamic Served Content in general.
Assume that all requests are actually handled by a Default.aspx page. This isn't the way that Sitefinity works, but it is the way that DotNetNuke works, and illustrates the problem nicely.
The url rewrite isfile and isdirectory flags check for physical existence of files. In this case, only Default.aspx actually physically exists. All the other dynamically served content is generated later in the request cycle, and has no physical existence whatsoever.
Because of that, the isfile flag will always fail, and the redirect rule will always execute.
The solution I went with was to allow IIS and .NET to handle the 404s themselves, which properly respects generated content. and route that to a custom error page, 404redirection.aspx. There is code on that page that redirects to my old site, where that content is likely to be host. That site then has additional 404 handling that routes back to the 404NotFound.aspx page, so requests for files that don't exist in either system make a round trip and look like they never went anywhere. This also has the nice side effect of pages that aren't found on the old server get to display our new, pretty, rebranded 404 on the new server.
Simply put, rather than attempting to pre-empt the content generation and error handing, I took a more "go with the flow" approach, and then diverted the flow at a more opportune time.

IIS7 URL Rewriting Outbound rules

I can't seem to get my head around these rewrite rules for some reason and I was hoping you guys could help. What I want is an outbound rule that will rewrite paths for link, img, script, and input tags.
I want to change this: http://www.mysite.com/appname/css/file.css
To this: http://cdn.mysite.com/css/file.css
So, basically I need to swap the host name and drop the app name from the URL. I've got the pre-condition filters to *.aspx files set already, but the rest seems like Greek to me.
EDIT for clarity
The appname in the URL above is an application in IIS. It's a placeholder for whatever appname happens to be in use. It could be any of over 50 different apps with our current setup. There will ALWAYS be an appname. Perhaps that will make the rule even easier.
The hostname, in this case www.mysite.com, can also vary slightly in terms of the subdomain. It might be www1.mysite.com, www2, etc. Also, just realized that I need to maintain the SSL if there.
So, I guess when it comes down to it, I really just need to take the URL, minus the appname, and append it to the new domain, while respecting the protocol that was used.
Original URL: http(s)://{host}/{appname}/{URL}
Output: http(s)://cdn.mysite.com/{URL}
I assume your website domain is always the same, then this rule should do:
<rule name="CdnRule" preCondition="OnlyAspx" >
<match filterByTags="Img, Input, Link, Script" pattern="^(.+)://.+?\.(.+?)/.+?/(.*)" />
<action type="Rewrite" value="{R:1}://cdn.{R:2}/{R:3}" />
</rule>
<preConditions>
<preCondition name="OnlyAspx">
<add input="{PATH_INFO}" pattern=".+\.aspx$" />
</preCondition>
</preConditions>
EDIT: changed according to clarified question
I assume the subdomain (www, www2, ...)is always there and it has to be ignored in target url.

ASP .NET page name "alias"

I have a web of which I have two versions: one in spanish and one in english. They are located in different servers and different domains. So they actually behave as two different websites.
I only have one ASP .NET project, and depending on the domain, I show all texts in spanish or in english. That's working right.
I developed it first in spanish, so my page names are written in spanish, like "Buscar.aspx" ("Buscar" means "Search").
I would like to translate also the page's name, so that in browser's address bar, it would appear the english names. For instance, for my page "Buscar.aspx" I would like to appear "Search.aspx" in the address bar.
So my question is: is there any way to declare some kind of "alias" (or some other mechanism), so that I can process requests to "Buscar.aspx" and "Search.aspx" through one single ASP .NET page, but still appearing in the address bar as two different addresses?
URL Rewriting
You could rewrite Search.aspx to Buscar.aspx
<rewrite>
<rules>
<rule name="Search">
<match url="^Search.aspx" />
<action type="Rewrite" url="Buscar.aspx" />
</rule>
</rules>
</rewrite>
These rules could then be put in your English web.config file
Have a look at routing. You can find some documentation here: ASP.NET Routing
Routing means that you can specify a path that maps to a certain ASPX. If you switch the routing configuration based on your language setting you have what you need :)

Resources