How to specify servlet error-page per path - servlets

web.xml's <error-page> allows a developer to specify what to return to client in case of some error (either HTTP status or java exception).
But I have 2 different 404 error pages, per locale.
My web application is structured so that all resources for locale A is under path /a/; resources for locale B under path /b/.
I'd like to have a localized error page for 404 when trying to access pages under each locale (to be clear, trying to access /a/some-undefined-resource should return 404 + an error page localized for locale A).
Given other limitations, it is not really possible to deploy 2 separate applications, a.war and b.war for each locale.
How can I serve an error page that depends on original resource requested?

I ended giving up the idea to use <error-page> to serve my SPA and used the urlrewrite filter to rewrite URLs to either /a/index.html or /b/index.html.
The downside is that now I have two places to edit when I add new routes inside my SPA:
angular routes (app-routing.module.ts), and
urlrewrite.xml.
Besides, if I add a new locale in the future, I'll need to add a new set of rules to cover all my routes.
Also I don't know how this filter will impact my site's performance. Since it is a low traffic project, I'll keep it this way until I find a better solution.
There is an upside too: now those - otherwise legal - requests are served with status 200, and really missing resources are signaled with 404 and no default content included.

Related

Do any CDNs allow rewriting request URI's so that client-side routing plays nicely with browser refreshes?

I have an HTML5 app written in static html/js/css (it's actually written in Dart, but compiles down to javascript). I'm serving the application files via CDN, with the REST api hosted on a separate domain. The app uses client-side routing, so as the user goes about using the app, the url might change to be something like http://www.myapp.com/categories. The problem is, if the user refreshes the page, it results in a 404.
Are there any CDN's that would allow me to create a rule that, if the user requests a page that is a valid client-side route, it would just return the (in my case) client.html page?
More detailed explanation/example
The static files for my web app are stored on S3 and served via Amazon's CloudFront CDN. There is a single HTML file that bootstraps the application, client.html. This is the default file served when visiting the domain root, so if you go to www.mysite.com the browser is actually served www.mysite.com/client.html.
The web app uses client-side routing. Once the app loads and the user starts navigating, the URL is updated. These urls don't actually exist on the CDN. For example, if the user wanted to browse widgets, she would click a button, client-side routing would display the "widgets" view, and the browser's url would update to www.mysite.com/widgets/browse. On the CDN, /widgets/browse doesn't actually exist, so if the user hits the refresh button on the browser, they get a 404.
My question is whether or not any CDNs support looking at the request URI and rewriting it. So, I could see a request for /widgets/browse and rewrite it to /client.html. That way, the application would be served instead of returning a 404.
I realize there are other solutions to this problem, namely placing a server in front of the CDN, but it's less ideal.
I do this using CloudFront, but I use my own server running Apache to accomplish this. I realize you're using a server with Amazon, but since you didn't specify that you're restricted to that, I figured I'd answer with how to accomplish what you're looking to do anyway.
It's pretty simple. Any time you query something that isn't already in the cache on CloudFront, or exists in the Cache but is expired, CloudFront goes back to your web server asking it to serve up the content. At this point, you have total control over the request. I use the mod_rewrite in Apache to capture the request, then determine what content I'm going to serve depending on the request. In fact, there isn't a single file (spare one php script) on my server, yet cloudfront believes there are thousands. Pretty sure url rewriting is standard on most web servers, I can only confirm on lighttp and apache from my own experience though.
More Info
All you're doing here is just telling your server to rewrite incoming requests in order to satisfy them. This would not be considered a proxy or anything of the sort.
The flow of content between your app and your server, with cloudfront in between is like this:
appRequest->cloudFront
if cloudFront has file, return data to user without asking your server
for the file.
If cloudFront DOESN'T have the file (or it has expired), go back to
the origin server and ask it for a new copy to cache.
So basically, what is happening in your situation is this:
A)app->ask cloudfront for url cloud front doesn't have
B)cloudfront
then asks your source server for the file
C)file doesn't exist there,
so the server tells cloudFront to go fly a kite
D)cloudFront comes back empty handed and makes your app 404
E)app crashes and
burns, users run away and use something else.
So, all you're doing with mod_rewrite is telling your server how it can re-interpret certain formatted requests and act accordingly. You could point all .jpg requests to point to singleImage.jpg, then have your app ask for:
www.mydomain.com/image3.jpg
www.mydomain.com/naughtystuff.jpg
Neither of those images even have to exist on your server. Apache would just honor the request by sending back singleImage.jpg. But as far as cloudfront or your app is concerned, those are two different files residing at two different unique places on the server.
Hope this clears it up.
http://httpd.apache.org/docs/current/mod/mod_rewrite.html
I think you are using the URL structure in a wrong way. the path which is defined by forward slashes is supposed to bring you to a specific resource, in your example client.html. However, for routing beyond that point (within that resource) you should make use of the # - as is done in many javascript frameworks. This should tell your router what the state of the resource (your html page or app) is. if there are other resources referenced, e.g. images, then you should provide different paths for them which would go through the CDN.

AngularJS routing hides 404 responses for nonexistent routes

I have noticed that GET requests for nonexistent paths don't return a 404 response. Instead, the client gets a "200 Ok", AngularJS renders the main view, and rewrites the path to /. A request for a nonsense URI is logged as successful in the server logs. If I understand correctly, the problem is that since AngularJS handles routing, the server has to accept a GET request for any URI and always respond by serving the client side of the app ("200 Ok" or "304 Not Modified").
For example, using the project scaffolded by the angular-fullstack Yeoman generator, requesting a nonexistent /unicorn goes like this:
GET /unicorn 200 31ms - 3.29kb
GET /partials/main 304 36ms
GET /api/awesomeThings 304 5ms
The Express route that handles the request looks like this:
// server, last route:
app.get('*', controllers.index);
// controllers:
exports.index = function(req, res) {
res.render('index');
};
and index.jade is the root of the whole client side of the app.
After a quick look at the server side code of other AngularJS / Express projects on Github (AngularJS Express seed, AngularJS login), I see that this is a common pattern. I am wondering if there is a better way to handle requests for nonexistent paths, so that the client gets a real HTTP 404 response?
The angular documentation has a section about the routing. Also, this question and this question have some information that pertains to IIS but could easily be adapted to express.
Html link rewriting
When you use HTML5 history API mode, you will need different links in different browsers, but all you have to do is specify regular URL links, such as: link
When a user clicks on this link,
In a legacy browser, the URL changes to /index.html#!/some?foo=bar
In a modern browser, the URL changes to /some?foo=bar
In cases like the following, links are not rewritten; instead, the browser will perform a full page reload to the original link.
Links that contain target element
Example: link
Absolute links that go to a different domain
Example: link
Links starting with '/' that lead to a different base path when base is defined
Example: link
When running Angular in the root of a domain, along side perhaps a normal application in the same directory, the "otherwise" route handler will try to handle all the URLs, including ones that map to static files.
To prevent this, you can set your base href for the app to <base href="."> and then prefix links to URLs that should be handled with .. Now, links to locations, which are not to be routed by Angular, are not prefixed with . and will not be intercepted by the otherwise rule in your $routeProvider.
Server side
Using this mode requires URL rewriting on server side, basically you have to rewrite all your links to entry point of your application (e.g. index.html)
You can use $route.otherwise() function
In order to decide what to do with undefined
Routes.
If you want to still show a 404 message,
You could simply set a /404.html route both in this Function and in express.
This is actually express handling routing--not angular. Remove the app.get('*', ... that you found to disable that.

Anatomy of G-WAN URI servlets

gwan/csp/strangesubfolder/inc.c can be visited via http://domainName.com/strangesubfolder/?inc
I feel this servlet mapping strange but that suits my need. I can't find the mapping description in the gwan user's manual.
Please correct me if I am wrong and confirm if it is the expected behavior.
Yes it is a standard feature.
The '?' tells G-WAN that it is a servlet. If there's no '?' it will look for the file in WWW folder.
Update:
Now I understand your confusion.
Since version release 3.3.27 this has been changed so users can easily make restful URL's
G-WAN timeline
Read the update for March 27 2012.
Now you need to place the '?' before the actual servlet name. By doing this G-WAN can efficiently rewrite '/' to '&' so you can use restful URL's like these without writing any code.
//Old way
http://domain/?user/profile&user1
http://domain/?blog/archive&2012&march
//New way (more restful no '&')
http://domain/user/?profile/user1
http://domain/blog/?archive/2012/march
Yes, as Richard rightly (and promptly, thanks Richard!) explained it, this is the expected behavior.
The directory /gwan/.../csp/script.c is used to store servlets that must be run while /gwan/.../www/script.c is used to store files intended to be served as an HTTP resource.
The corresponding URLs are GET /?script.c and GET /script.c.
Any sub-directory used in the /csp or /www folders is reflected accordingly in the HTTP request: GET /folder/?script.c for dynamic contents and GET /folder/script.c for static contents.
The choice of moving the '?' query character (which can be replaced by other characters) from the old GET /csp?/folder/script.c form to the new GET /folder/?script.c form was motivated by the need to:
distinguish servlet names from folder names (requests can lack the servlet extension for the defined 'default' programming language, which is C if nothing is defined)
allow any number of sub-directories in HTTP requests
allow any number of query arguments in HTTP queries
distinguish between folders and query arguments in HTTP requests
make it possible to have RESTFUL requests in all the above cases.
It took us a while to find the proper mix of features with the minimal verbosity but experience has shown that this works well.
Here is an example of a RESTFUL query having both a sub-folder and query arguments:
GET /folder/?script/arg1/value1/arg2/value2/arg3/value3
By default, this is a C script, unless another language (among the 15 available for scripting) has been defined as the 'default' language.
Note that the 50+ script examples provided in the download archive illustrate this scheme which is also presented on the developers page.

HTTP POST to external ASPX form causing a HTTP 405 error in some cases

I have a situation where we're aggregating what amounts to marketing data from N number of clients, where a client can host a HTML form using any backend of their choice, each with the action of the form pointing to a path that we're hosting. Each client has a different URL, there's no auth (but there is some simple validation of the data) and it's all generally working just fine.
However, there's one small wrinkle that I can't seem to get my head around.
The aspx that is processing the submitted data resides at a path, let's call it ~/submit/default.aspx. The idea is that we should be able to hand to our partner a URL along the lines of "http://sample.com/submit/?foo=bar" as the action of their form. Doing this however results in a HTTP 405 error, "Resource not allowed".
Having the action of the form set as "http://sample.com/submit/default.aspx" works just fine and dandy however.
Default.aspx is set as one of the default document names in IIS 6.
The .aspx file extension is properly mapped to the correct .Net dll and has the verbs GET, HEAd, POST, and DEBUG activated for the mapping.
Those were the only two things I could think of to double check first--anyone else have any ideas? I'd have preferred to use URL rewriting / routing with IIS7, but that's unfortunately not an option--and I have a number of additional requirements where "clean" URLs will highly be preferable, so solving this problem is going to be a pretty core problem to get through.
IIRC, IIS will only use the default docs if the requested resource is a directory. Since the requested resource in the first case is not, it'll never make it through the default doc handlers - instead failing on a POST to an unregistered script extension (405).
it may depend on the document type of "http://sample.com/submit/?foo=bar"... if you IIS doesn't know how to handle the document type being returned to it (which then returns it to you, the client), then you may get an http 405 error - which means that it doesn't know how to handle that document type, server-wise. Maybe try putting something like
in your web.config file that drives the app. HTTP Handlers are modular pieces of code, written and compiled in a .net language, and act as kind of a 'servlet' if you're familiar with Java terms. It's a piece of code that writes out something to the client -- in your case maybe a rendering of a .doc file, found programmatically in your handler class.
for some reason, it didn't render my code sample!! you guys need to decode and encode less than and greater than signs for your "Your Answer" text box.... anyways,
<httpHandlers>
<add verb="your.class.to.handle.doc.files"/>
</httpHandlers>
is what should be in your web.config file.

ASP.NET 404 (page not found) redirection with original parameters preserved

I'm replacing an old web application with a new one, with different structure.
I can not change the virtual directory path for the new app, as I have users which have bookmarked different links to the old app.
Lets say I have a user, who has this bookmark:
http://server/webapp/oldpage.aspx?data=somedata
My new app is going to reside in the same virtual directory, replacing the old one, but it has no longer oldpage.aspx, instead it has different layout, but it still needs the parameter from the old url.
So, I have set to redirect 404 errors to redirectfrombookmark.aspx, where I decide how to process the request.
The problem is, that the only parameter I receive is "aspxerrorpath=/webapp/oldpage.aspx", but not the "data" parameter, and I need it to correctly process the request.
Any idea how I can get the full "original" url in the 404 handler?
EDIT: reading the answers, looks like I did not make the question clear enough:
The users have bookmarked many different pages (oldpage1, oldpage2, etc.) and I should handle them equally.
The parameters for each old page are almost the same, and I need a specific ones only.
I want to re-use the "old" virtual directory name for the "new" application.
The search bots, etc., are not a concern, this is internal application with dynamic content, which expires very often.
The question is - can I do this w/o creating a bunch of empty pages in my "new" application with the old names, and Request.Redirect in their OnLoad. I.e. can this be done using the 404 mechanism, or some event handling in Global.asax, etc.
For the purposes of SEO, you should never redirect on a 404 error. A 404 should be a dead-end, with some helpful information of how to locate the page you're looking for, such a site map.
You should be using a 301, moved permanently. This allows the search bots to update their index without losing the page rank assigned to the original page,
See: http://www.webconfs.com/how-to-redirect-a-webpage.php on how to code this type of response.
You could look into the UrlRewritingNet component.
You should also look into using some of the events in your Global.ascx(?extention) file to check for errors and redirect intelligently. The OnError event is what you want to work with. You will have the variables from the request at that point in time (under the HttpContext object) and you can have your code work there instead of a 404. If you go this route, be sure you redirect the 404 correctly for anything other than oldpage.aspx.
I am sorry I don't have any explicit examples or information right now, hopefully this will point you in the right direction.
POST and GET parameters are only available per request. If you already know the name of the old page (OldPage.aspx) why not just add there a custom redirect in it?

Resources