ASP.Net Relative Redirects and Resource Paths - asp.net

We are working on the conversion of an ASP site to ASP.Net and are running into problems with redirects and resource locations. Our issues are coming from a peculiarity of our set-up. Our site can be accessed in two ways:
Directly by URL: http://www.mysite.com - in this case everything works fine
Via a proxy server with a URL like: http://www.proxy.com/mysite_proxy/proxy/
In #2 "mysite_proxy" is a mapping on proxy.com that directs the request behind the scenes to www.mysite.com, "proxy" is a virtual sub-website that just redirects the request to the root of www.mysite.com. It essntially is meant to give us a convenient way of knowing if a request is hitting the site from the proxy or not.
We are running into two problems with this setup:
Using Response.Redirect either with the "~" or a plain relative path (Default.aspx) generates a 302 response with a location of "/proxy/rest_of_the_path.aspx." This causes the browser to request http://www.proxy.com/proxy/rest_of_the_path.aspx which isn't anything and doesn't even hit our server so we couldn't do an after the fact re-write.
Using "~" based URLs in our pages for links, images, style-sheets, etc. creates the same kind of path: "/proxy/path_to_resources.css." We could probably solve some of these by using relative paths for all these resources though that would be a lot of work and it would do nothing to address similar resource links generated by the framework and 3rd party components.
Ideally I want to find a global fix that will make these problems transparent to the developers working on the site. I have a few ideas at this point:
Getting rid of the proxy, it is not really needed and is there for administrative and not technical reasons. Easiest to accomplish technically, the hardest to accomplish in the real world.
Hand the problem off to the group that runs the proxy and say it is their problem they need to fix it.
Use a Response filter to modify the raw html before it is sent to the client. I know this could fix my resource links, but I am not certain about the headers (need to test it out) and there would be a performance hit to having to parse every response looking for and re-writing urls.
All of these solutions have big negatives in my mind and I was hoping someone might have another idea. So any thoughts?
Aside: there are a lot of posts up already that deal with the reverse of this issue: I have a relative URL, how do I may it absolute, but I didn't come across anything that fit the bill for the other direction.

As a fix, I'd go with a small detection routine at Global.asax:Session_Start (since i imagine that the proxy doesn't actually starts another application instance), set a session variable with the correct path, and use it instead of '~'.
In the case a different application instance is used, then use Application_Start instead of Session_Start and a static Global variable instead of a Session variable.

Related

Do any CDNs allow rewriting request URI's so that client-side routing plays nicely with browser refreshes?

I have an HTML5 app written in static html/js/css (it's actually written in Dart, but compiles down to javascript). I'm serving the application files via CDN, with the REST api hosted on a separate domain. The app uses client-side routing, so as the user goes about using the app, the url might change to be something like http://www.myapp.com/categories. The problem is, if the user refreshes the page, it results in a 404.
Are there any CDN's that would allow me to create a rule that, if the user requests a page that is a valid client-side route, it would just return the (in my case) client.html page?
More detailed explanation/example
The static files for my web app are stored on S3 and served via Amazon's CloudFront CDN. There is a single HTML file that bootstraps the application, client.html. This is the default file served when visiting the domain root, so if you go to www.mysite.com the browser is actually served www.mysite.com/client.html.
The web app uses client-side routing. Once the app loads and the user starts navigating, the URL is updated. These urls don't actually exist on the CDN. For example, if the user wanted to browse widgets, she would click a button, client-side routing would display the "widgets" view, and the browser's url would update to www.mysite.com/widgets/browse. On the CDN, /widgets/browse doesn't actually exist, so if the user hits the refresh button on the browser, they get a 404.
My question is whether or not any CDNs support looking at the request URI and rewriting it. So, I could see a request for /widgets/browse and rewrite it to /client.html. That way, the application would be served instead of returning a 404.
I realize there are other solutions to this problem, namely placing a server in front of the CDN, but it's less ideal.
I do this using CloudFront, but I use my own server running Apache to accomplish this. I realize you're using a server with Amazon, but since you didn't specify that you're restricted to that, I figured I'd answer with how to accomplish what you're looking to do anyway.
It's pretty simple. Any time you query something that isn't already in the cache on CloudFront, or exists in the Cache but is expired, CloudFront goes back to your web server asking it to serve up the content. At this point, you have total control over the request. I use the mod_rewrite in Apache to capture the request, then determine what content I'm going to serve depending on the request. In fact, there isn't a single file (spare one php script) on my server, yet cloudfront believes there are thousands. Pretty sure url rewriting is standard on most web servers, I can only confirm on lighttp and apache from my own experience though.
More Info
All you're doing here is just telling your server to rewrite incoming requests in order to satisfy them. This would not be considered a proxy or anything of the sort.
The flow of content between your app and your server, with cloudfront in between is like this:
appRequest->cloudFront
if cloudFront has file, return data to user without asking your server
for the file.
If cloudFront DOESN'T have the file (or it has expired), go back to
the origin server and ask it for a new copy to cache.
So basically, what is happening in your situation is this:
A)app->ask cloudfront for url cloud front doesn't have
B)cloudfront
then asks your source server for the file
C)file doesn't exist there,
so the server tells cloudFront to go fly a kite
D)cloudFront comes back empty handed and makes your app 404
E)app crashes and
burns, users run away and use something else.
So, all you're doing with mod_rewrite is telling your server how it can re-interpret certain formatted requests and act accordingly. You could point all .jpg requests to point to singleImage.jpg, then have your app ask for:
www.mydomain.com/image3.jpg
www.mydomain.com/naughtystuff.jpg
Neither of those images even have to exist on your server. Apache would just honor the request by sending back singleImage.jpg. But as far as cloudfront or your app is concerned, those are two different files residing at two different unique places on the server.
Hope this clears it up.
http://httpd.apache.org/docs/current/mod/mod_rewrite.html
I think you are using the URL structure in a wrong way. the path which is defined by forward slashes is supposed to bring you to a specific resource, in your example client.html. However, for routing beyond that point (within that resource) you should make use of the # - as is done in many javascript frameworks. This should tell your router what the state of the resource (your html page or app) is. if there are other resources referenced, e.g. images, then you should provide different paths for them which would go through the CDN.

Alternative to Response.Redirect to effect a subdomain

I have a site that is hosted in shared hosting environment. They use a wildcard subdomain setup and suggest using Response.Redirect to achieve the illusion of a subdomain.
Is there a way of doing this such that the "switch" takes place on the server rather than bouncing back down to the browser first?
Server.Transfer only works if I transfer to an actual resource. So redirecting from sub1.mydomain.com to www.mydomain.com/public/ does not work. I'd have to redirect to www.mydomain.com/public/mypage.aspx instead which i dont want to do.
To ensure that the "switch" takes place on the server, you could create a simple HTTP Module to intercept each request, inspect the requested URL and then forward them as needed . All your module has to do is handle the OnBeginRequest event, and then forward the request. In this way you could really have unlimited sub-domains.
Also might want add a blank host header, so that any requests for subdomains not listed get forwarded to the proper default website
If you aren't familiar with them, modules are very simple to create and work with.
Heres a link to a very similar implementation by Brendan Tompkins:
http://codebetter.com/blogs/brendan.tompkins/archive/2006/06/27/146875.aspx
You could also do some URL rewriting in the module should you need specific URL "look" behavior.

Are there any safe assumptions to make about the availability of a URL?

I am trying to determine if there is a way to check the availability of a potentially large list of urls (> 1000000) without having to send a GET request to every single one.
Is it safe to assume that if http://www.example.com is inaccessible (as in unable to connect to server or the DNS request for the domain fails), or I get a 4XX or 5XX response, then anything from that domain will also be inaccessible (e.g. http://www.example.com/some/path/to/a/resource/named/whatever.jpg)? Would a 302 response (say for whatever.jpg) be enough to invalidate the first assumption? I imagine sub domains should be considered distinct as http://subdomain.example.com and http://www.example.com may not direct to the same ip?
I seem to be able to think of a counter example for each shortcut I come up with. Should I just bite the bullet and send out GET requests to every URL?
Unfortunately, no you cannot infer anything from 4xx or 5xx or any other codes.
Those codes are for individual pages, not for the server. It's quite possible that one page is down and another is up, or one has a 500 server-side error and another doesn't.
What you can do is use HEAD instead of GET. That retrieves the MIME header for the page but not the page content. This saves time server-side (because it doesn't have to render the page) and for yourself (because you don't have to buffer and then discard content).
Also I suggest you use keep-alive to accelerate responses from the same server. Many HTTP client libraries will do this for you.
A failed DNS lookup for a host (e.g. www.example.com) should be enough to invalidate all URLs for that host. Subdomains or other hosts would have to be checked separately though.
A 4xx code might tell you that a particular page isn't available, but you couldn't make any assumptions about other pages from that.
A 5xx code really won't tell you anything. For example, it could be that the page is there, but the server is just too busy at the moment. If you try it again later it might work fine.
The only assumption you should make about the availability of an URL is that "Getting an URL can and will fail".
It's not safe to assume that a sub domain request will fail when a parent one does. Namely because inbetween your two requests your network connection can go up, down or generally misbehave. It's also possible for the domains to be changed in between requests.
Ignoring all internet connection issues. You are still dealing with a live web site that can and will change constantly. What is true now might not be true in 5 minutes when they decide to alter their page structure or change the way the display a particular page. Your best bet is to assume any get will fail.
This may seem like an extreme view point. But these events will happen. How you handle them will determine the robustness of your program.
First don't assume anything based on a single page failing. I have seen many cases where IIS will continue to serve static content but not be able to serve any dynamic content.
You have to treat each host name as unique you cannot assume subdomain.example.com and example.com point to the same IP. Or even if they do there is no guarentee that are the same site. IIS again has host headers that allows you to run multiple sites using a single IP Address.
If the connection to the server actually fails, then there's no reason to check URLs on that server. Otherwise, you can't assume anything.
In addition to what everyone else is saying, use HEAD requests instead of GET requests. They function the same, but the response doesn't contain the message body, so you save everyone some bandwidth.

Uri.EscapeDataString(Request.Url.AbsoluteUri) is wrong in different environment, What else should I use?

I am generating a return url link so when the user hits the close button on this page they return to this ReturnUrl.
ie
http://localhost:42605/Search.aspx?ReturnUrl=http%3A%2F%2Flocalhost%3A42605%2FStuff%2FViewStuff.aspx%3FProjectId%3D2246
This works fine in dev environment, but in uat environment I have
http://app-uat.com/Search.aspx?ReturnUrl=http%3A%2F%2Fapp-uat-01.com%2FStuff%2FViewStuff.aspx%3FProjectId%3D2246
Notice the extra 01 on the ReturnUrl parameter.
So to generate the ReturnUrl bit I'm currently using
Uri.EscapeDataString(Request.Url.AbsoluteUri)
As I dont have direct access to the uat environment I dont know what will definetly work until after a release cycle so if I can avoid getting it wrong the first time that would be helpful.
Looking at Request.Url in debug the possibilities I have
DnsSafeHost or
Host
which could be used with
AbsolutePath or
LocalPath or
PathAndQuery
or I have
OriginalString
Or I maybe I could use the referer instead?
PathAndQuery gives you a relative path with querystring vars so you don't have to worry about the extra '01' on the domain. Just deal with relative paths.
Out of curiosity, is your uat environment load balanced? It appears that your app is confused as to which domain it's responding on. This could(possibly) happen if a request hits a balanced server directly versus the load balance point or the load balancer forwards requests to machine-specific domain?
I would request info from your network admin. If you describe what you're doing in great detail, especially that your app sometimes believes it's responding on app-uat-01, they might see the issue right away.

Is it safe to redirect to the same URL?

I have URLs of the form http://domain/image/⟨uuid⟩/42x42/some_name.png. The Web server (nginx) is configured to look for a file /some/path/image/⟨uuid⟩/thumbnail_42x42.png, and if it does not exist, it sends the URL to the backend (Django via mod_wsgi) which then generates the thumbnail. Then the backend emits a 302 redirect to exactly the same URL that was requested by the client, with the idea that upon this second request the server will notice the thumbnail file and send it directly.
The question is, will this work with all the browsers? So far testing has shown no problems, but can I be sure all the user agents will interpret this as intended?
Update: Let me clarify the intent. Currently this works as follows:
The client requests a thumbnail of an image.
The server sees the file does not exist, so it forwards the request to the backend.
The backend creates the thumbnail and returns 302.
The backend releases all the resources, letting the server share the newly generated file to current and subsequent clients.
Having the backend serve the newly created image is worse for two reasons:
Two ways of serving the same data must be created;
The server is much better at serving static content. What if the client has an extremely slow link? The backend is not particularly fast nor memory-efficient, and keeping it in memory while spoon-feeding the client can be wasteful.
So I keep the backend working for the minimum amount of time.
Update²: I’d really appreciate some RFC references or opinions of someone with experience with lots of browsers. All those affirmative answers are pleasant but they look somewhat groundless.
If it doesn't, the client's broken. Most clients will follow redirect loops until a maximum value. So yes, it should be fine until your backend doesn't generate the thumbnail for any reason.
You could instead change URLs to be http://domain/djangoapp/generate_thumbnail and that'll return the thumbnail and the proper content-type and so on
Yes, it's fine to re-direct to the same URI as you were at previously.

Resources