My Spring MVC application has many public-facing pages with URLs in this format:
http://www.example.com/?productId=123456
Somewhere along the way, a major search engine picked up an extraneous URL and started spidering thousands of pages with a corrupt URL (note the ;) in this format:
http://www.example.com/;?productId=123456
Strangely, Spring MVC completely ignores the ;. How can I detect this extraneous ; and issue a redirect to the correct URL?
Apparently nothing available in Spring MVC, so I just placed a simple ServletFilter in front of it.
Related
I stumbled upon some strange behaviour today on a website at work. Our SEO consultant wanted some strange looking links taken away from Googles index, a seemingly straight-forward task. But it turned out to be very difficult.
The website was a .net MVC 5.2.3 application. We looked at routing, our own libraries etc. Nothing strange. After a while we gave up and tried simply redirect request to these urls by setting up a rule in web.config. Turns out these URL:s are unmatchable! Somehow under the right conditions the critical part of the URL seem to avoid matching rules as well as routing later on in the MVC application.
We narrowed down the mystical URL:s to the format (T(anything)) where T can by any capital letter and anything can be eh, anything. This is placed in the beginning of the URL as if it were a directory. In regex: \([A-Z]\([a-zA-Z0-9]*\)\)
I've tested and found the same behaviour on:
.net MVC5 sites
.net MVC3 sites
.net Web Forms sites
http://asp.net
http://stackoverflow.com
Some examples from stackoverflow.com:
Bypasses routing: https://stackoverflow.com/(K(jonas))/questions
Routes normal (404): https://stackoverflow.com/jonas/questions
Bypasses routing: https://stackoverflow.com/(G(hello))/users/1049710/jonas-%C3%84ppelgran
Routes normal (404): https://stackoverflow.com/gandhello/users/1049710/jonas-Äppelgran
It doesn't seem to affect the whole web, so it shouldn't be a browser or HTTP issue. Some examples:
Routes normal (404): http://php.net/(T(testing))/downloads
Routes normal (404): https://www.iana.org/(T(testing))/domains/reserved
Can anybody explain what is going on?
And what I can do to prohibit these URL:s to bypass routing?
Apparently this is a feature called a "cookieless session" in ASP.NET. See "Cookieless SessionIDs" section here in the MSDN docs.
The basic idea is that instead of storing the session id (if session state is enabled) in a cookie, it's now embedded in the URL.
We (Stack Overflow) disable session state entirely (by setting sessionState mode to off). As far as I know, the end result is that any time one of the URLs that match the session id format is used, that information is simply discarded.
None of the links leading to us in Google include it either, which makes me think that your site may be configured to actually generate session IDs in URLs? Short of disabling the feature, there's probably not much you can do here. Although, see "Regenerating Expired Session Identifiers" on the MSDN page I linked above to see how to at least prevent accidental session sharing if that's not already done.
So I've set up an HTML5 single page application, and it's working well. The page is at /App/ and when some one goes to content it looks like /App/Content/1234.
One problem: If the user refreshes the page the server can't find that URL because it doesn't actually exist. If I send them to /App/#/Content/1234, they're golden, but what is the best way to do this? I have a LOT of different styles of URL under /App.
What is the best way to globally catch any request under ~/App/(.*) and redirect it to ~/App/#/$1?
The only route registered in MVC is the standard OOTB route.
Sounds like your server is not re-writing the urls to the app's base URL.
The URL re-writing needed on the web server is server-dependent. For Apache, you'd use mod_rewrite.
Instead, switch Angular to the "Hashbang mode" (the default) so the urls will all store the local state after the # in the url.
I don't want my apps to require server configuration changes, so I recommend hashbang mode.
See AngularJS docs. See section "Hashbang and HTML5 Modes" The HTML5 mode section describes all the configuration issues needed to support HTML5 mode for the urls.
This awesome dude describes how to fix this here.
In brief:
Remove MVC nugets (unless you use MVC controllers for anything) -
you can keep the Web API nugets. Keep WebPages and Razor packages.
Also delete MVC controllers and views.
You can keep using .cshtml
files with some web.config modifications. You'll need this for
bundling.
Finally you add a rewrite rule on web.config to point all urls (excluding content, images, scripts etc) to index.html
ASP.NET MVC newbie question:
I've set up a MVC site, with a few controllers. Now my site also has a lot of content files, which are stored in a network of subfolders within my web site, and I need to be able to access them directly, e.g.
http://mydomain.com/Content/Images/Geography/Asia/Japan/TokyoAtNight.jpg
Is there a way to make this a direct pass-through to the content folder, as specified by the path, or do I have to make a Content controller that interprets the rest of the URL and returns the file as some kind of ActionResult? Bear in mind, of course, that there will be lots of different content types, not just JPEGs.
Thanks for your help!
This should work without you doing anything - static files are not processed by the routing engine.
You want to look into Routing, and IgnoreRoute specifically. Here are a couple of places to start.
Asp.Net Routing: How do I ignore multiple wildcard routes?
http://www.asp.net/mvc/tutorials/asp-net-mvc-routing-overview-cs
Take a look at the #Url.Content() helper method.
Url.Content("~Content/Images/Geography/Asia/Japan/TokyoAtNight.jpg")
Yes.
The IRouteHandler and the route registration in your global.asax is your extensibility point for configuring how MVC handles url paths.
However, by default ASP.NET MVC will allow you to access image files directly, without any additional configuration.
ASP.net 4.0 / IIS7.
I want to "alias" a single web form to appear as various extensionless urls. Given a form in my web root called "mySite.com/ColorWebForm456.aspx":
I want it served as multiple names, e.g., mySite.com/Color, mySite.com/Colour, mySite.com/Colors, mySite.com/Coler, etc., without creating folders and duplicate forms with those names.
I never want mySite.com/ColorWebForm456.aspx displayed in the browser, it needs to display as mySite.com/Color, even if the user somehow acquires it and types in that exact ~.aspx address.
The variations will account for several alternate or mis-spellings users might attempt - I don't want them "corrected", however. So, if a user types in mySite.com/Colour, the url is NOT rewritten to mySite.com/Color, but the same page is served via ColorWebForm456.aspx as the requested "mySite.com/Colour".
I've seen so many articles on this that I'm not even sure where this would be best handled: in Global.asax, IIS7 URL Rewrite, web.config, etc., and I'm not even sure this is technically a case of url rewriting or routing... ?
You can achieve this with ASP.NET 4 routing for web forms, or if you are using MVC its available too: http://weblogs.asp.net/scottgu/archive/2009/10/13/url-routing-with-asp-net-4-web-forms-vs-2010-and-net-4-0-series.aspx
Except potentially for #3; it would work, but you would have to explicitly declare the variations you want. In that case, rewriting supports regular expression matching, so that might be better for what you are looking for.
I had issues setting up URL rewriting and various aspects of web forms, so be aware. There is some setup required for a rewrite module potentially (I could have done it wrong too), where URL routing is already built in and handled.
HTH.
If you have a rewrite engine available:
(using ISAPI Rewrite .htaccess syntax)
# redirect any requests for the filename back to the friendly URL
RewriteRule ^/colorWebForm4567.aspx(.*) /Color$1 [NC,R=302]
# rewrite /Color requests to the web form
RewriteRule ^/(Color|Colour)/(.*) /ColorWebForm4567.aspx$2
Following article expalins it all, but if you will be using a web host after you complete your application I would check with the hosting provider first to see if they are offering everything you need to have:
http://learn.iis.net/page.aspx/496/iis-url-rewriting-and-aspnet-routing/
Hope this helps!
As a side note: Search engines dislike it, if you publish content under different URLs ("duplicate content").
So my recommendation would be to stick with one (rewritten) URL, not multiple (rewritten) URLs for the same content.
In my research, I found 2 ways to do them.
Both required modifications to the Application_BeginRequest procedure in the Global.Asax, where you would run your code to do the actual URL mapping (mine was with a database view that contained all the friendly URLs and their mapped 'real' URLs). Now the trick is to get your requests run through the .NET engine without an aspx extension. The 2 ways I found are:
Run everything through the .NET engine with a wildcard application extension mapping.
Create a custom aspx error page and tell IIS to send 404's to it.
Now here's my question:
Is there any reason one of these are better to do than the other?
When playing around on my dev server, the first thing I noticed about #1 was it botched frontpage extensions, not a huge deal but that's how I'm used to connecting to my sites. Another issue I have with #1 is that even though my hosting company is lenient with me (as I'm their biggest client) and will consider doing things such as this, they are wary of any security risks it might present.
`#2 works great, but I just have this feeling it's not as efficient as #1. Am I just being delusional?
Thanks
I've used #2 in the past too.
It's more efficient because unlike the wildcard mapping, the ASP.NET engine doesn't need to 'process' requests for all the additional resources like image files, static HTML, CSS, Javascript etc.
Alternatively if you don't mind .aspx extension in your URL's you could use: http://myweb/app/idx.aspx/products/1 - that works fine.
Having said that, the real solution is using IIS 7, where the ASP.NET runtime is a fully fledged part of the IIS HTTP module stack.
If you have the latest version of IIS there is rewrite module for it - see here. If not there are free third party binaries you can use with older IIS (i.e. version 6) - I have used one that reads the rewrite rules from an .ini file and supports regular expression but I cant remember its name sorry (its possibly this). I'd recommend this over cheaping it out with the 404 page.
You have to map all requests through the ASP.NET engine. The way IIS processes requests is by the file extension. By default it only processes the .aspx, .ashx, etc extensions that are meant to only be processed by ASP.NET. The reason is it adds overhead to the processing of the request.
I wrote how to do it with IIS 6 a while back, http://professionalaspnet.com/archive/2007/07/27/Configure-IIS-for-Wildcard-Extensions-in-ASP.NET.aspx.
You are right in doing your mapping from the database. RegEx rewriting, like is used out of the box in MVC. This is because it more or less forces you to put the primary key in the URL and does not have a good way to map characters that are not allowed in URLs, like '.
Did you checked the ASP .Net MVC Framework? Using that framework all your URLs are automatically mapped to Controllers which could perform any desired action (including redirecting to other URLs or controllers). You could also set custom routes with custom parameters. If you don't have seen it yet, maybe it will worth the look.