using customErrors for vanity URLs / asp.net url redirection - asp.net

So, from here...
In ASP.NET, you have a choice about how to respond to that - it's in the web.config as CustomErrors. Turn that on, then redirect to a fancy 404 page (maybe you already do). The fancy 404 page, then, could be checking the requested querystring (which gets passed over to the custom error page as yet another querystring) to see if it's a valid redirect, lives in your database, etc. Just do a Response.Redirect() from there.
Then schooner writes:
Thanks, we do have a 404 now but we would prefer this not to be detected as a 404 in the process. We would like ot handle it directly and seperately if possible.
..and I'd like to know just how bad a practice this is. I don't expect to put my "pretty" URLs on the internet (just business cards) and I have a sample of 404-redirecting-to-a-helpful-site code working, but I don't want to get to production and have an issue with a browser that takes the initial 404 too seriously. Can anyone help me understand more about why I wouldn't want to use customErrors / 404 to flow users to the page they actually wanted?

The main problem with using customeErrors as your 404 error handler is that every time customErrors picks up an errored request rather than throwing a 404 error back to your browser and letting your browser know there was a bad request, it instead returns a 302 which indicates that a page has been relocated to whatever your customErrors page is. This isn't bad for most users because they don't know or even notice the difference, the problem comes from the fact that web crawlers DO know the difference and the status code they receive directly affects how their indexing works.
Consider the scenario where you have a page at http://mysite.com/MyAwesomePageAboutStuff.aspx for some period of time and then one day you decide you no longer need it and delete the file. If Google or some other crawler has already indexed that URL and goes back to it after you delete it the crawler will get a 302 status code instead of a 404 error and because of this status code the crawler will update the page's url to point to your error page rather deleting the non-existent link. Now, whenever someone finds that url by way of a search engine they'll end up at your error page.
It's not really a huge issue, but you can definitely see the headaches this can create for your users in the long run.
Look here for some corroborating data.

I created a vanity url system using the 404 as the handler. There's no need for a 302 on my side as the 404 dynamically loads the content and returns that. I am fully able to handle any and all POST / GET and SERVER data.
Works great. If you are interested TarantulaHawk is up on SourceForge.

Related

404 error on my homepage although I can see my site

I am at my wits end with the following problem:
My site www.sebastianthalhammer.com is available under that URL without any problems.
However Google Search Console as well as other external third party test tools return a 404 error.
Status report from Uptrends
It is just the main page that's affected. All the other subpages and blog content isn't affected.
I have been in contact with the server stuff but it seems alright to them. As mentioned. The site can be reached. The site runs on wordpress - latest version.
I have no real clue where to start as this error seems to be quite a tricky one. Does anyone here might have an idea what's going on?
Sebastian
The 4xx class of status code is intended for cases in which the
client seems to have erred. Except when responding to a HEAD
request, the server SHOULD include a representation containing an
explanation of the error situation, and whether it is a temporary or
permanent condition. These status codes are applicable to any
request method. User agents SHOULD display any included
representation to the user.
This leaves me with two possible explanations:
Explanation 1: it's a server error.
the server wrongly returns a 404 status code
the browser thinks the response body contains details about the error and displays it - for the end user this is the actual page
Explanation 2: it's done on purpose to defeat crawlers and page watchers.
the server returns 404 on purpose - non-browser user agents won't process the result as they interpret it as error
browsers are unaffected, the end user doesn't care as long as the page is being displayed
The second one would indeed be kind of clever if you don't want your page to be indexed.
Thanks to your feedback I could think about the problem in a different way.
Ultimately at the unholy depths of a certain plugin I could dig out a setting that caused the error.
It was a redirection plugin that (for whatever reason) sent out a 404 signal when the URL was requested.
I don't know what the purpose would be for something. All I know is that the setting was on default for quite a while now and that caused the weird situation.
thanks guys for getting me on the right track.
Sebastian

How can a bot get the contents of subsequent pages in a category listing in WordPress?

I'm writing a bot to automatically download pages from my WordPress blog. The bot gets most of the pages without a problem. For example, it can easily get the first page of the article listing of a given tag: http://example.com/myblog/index.php/archives/tag/mytag. However, for some reason it can't get the subsequent pages, like http://example.com/myblog/index.php/archives/tag/mytag/page/2.
I've tried to figure out what was going on, and here's what I found: while the server answers normally to most requests, upon such requests it answers with a 301 permanent redirect. Peculiarly, the Location header is set to the exact same URL as the request! Basically, the server tells me to redirect my request of the page http://example.com/myblog/index.php/archives/tag/mytag/page/2 to... the very same page :P
When trying to access the page from the browser I get the page without a problem. I thought maybe the browser sends some headers (including cookies) that my bot doesn't send, so I copied the headers (including the cookies) from my browser's web console, but the behaviour didn't change.
I would appreciate any suggestions regarding what might be causing this strange behaviour, what I can do in order to understand what's going on better, and of course what I can do in order to fetch those pages automatically, just like I fetch their brethren.
Thanks!
It seems this post hasn't generated much public interest. However, in case somebody ever runs into the same problem and finds this post, here's the solution I used. Important note: I still don't understand the behaviour I witnessed, and would appreciate it if somebody could explain it.
So the solution I've found is basically to use the URL http://example.com/myblog/archives/tag/mytag?paged=2 instead of http://example.com/myblog/index.php/archives/tag/mytag/page/2. Funnily enough, this URL gets redirected to the original one when browsed to from a browser! But when the bot requested it it got the page without redirection or anything. (So I managed to do what I wanted to do, but I've got no idea what happened there, why there was a problem in the first place, and why this solution worked: for one URL the bot gets infinite redirection and the browser just gets the page, while for the other the browser gets redirected [finitely] and the bot gets the page. I am yet to figure this one out...)

Circular redirect path detected and wrong Open Graph data displayed

When sharing the following URL to Facebook
www.magicsoftware.com
You will get outdated information. Facebook refers to the site (magicsoftware.com/en) and takes all the information from the cache.
I tried to clear the cache by going to the dubugger-
https://developers.facebook.com/tools/debug/og/object?q=www.magicsoftware.com
But that didn't help much.
Someone has an idea what I can do?
P.S - if you checked the debugger link, you would see that there are two critical errors mentioned:
Could Not Follow Redirect: URL requested a HTTP redirect, but it could
not be followed. Errors That Must Be Fixed
Circular Redirect Path: Circular redirect path detected (see 'Redirect
Path' section for details).
What does that mean?
Your server is issuing redirect to the same URL as visited based on some condition, actually according to my tests on any requests that came without Accept-Language header get redirected.
See with Accept-Language header, and without any headers
Facebook linter doesn't seems to pass this header while crawling your OpenGraph meta and hung due to redirection loop.
You should avoid that redirection (or at least have some fallback) for Facebook linter to be able to collect updated data and update the cached version.
Same thing is happening to me now. I have no redirect in place. but I am getting this message " there was an error following the redirect path." when using the debugger on this URL http://www.mmaid.co/cleaning-services/offers/coupons/social-discount.php I will give it time and see if it fixes itself.
I found the solution myself - and it's only patience :)
Facebook just needs time to remove their cache files. So the solution is simply to use the Facebook Debugger to enter your URL and then to wait. Facebook will automatically refresh this URL cache.

Response Redirect URL returns HTTP Error 400 - Bad Request

I'm a noob when it comes to ASP.NET. I know few basic commands such as Response.Redirect("URL") to redirect my application web page to a different location.
However i receive HTTP Error 400 - Bad Request, whenever i try to use the code shown below
Response.Redirect(Server.UrlEncode(this.Downloadlink));
where this.Downloadlink is a user defined property which returns something like this
http://mdn.vatsag.net/fp;files/DOWNLOAD/VTSetup.exe
If i post this link in the browser, the .exe file pops up (means the link is good)
However this error comes when i use the ASP.NET code.
Any form of response on this issue/reason is deeply appreciated.
See here: http://www.kirit.com/Response.Redirect%20and%20encoded%20URIs
In short: if you quickly want to fix the issue, remove the part of your code that is UrlEncoding the URL!

ASP.NET 404 (page not found) redirection with original parameters preserved

I'm replacing an old web application with a new one, with different structure.
I can not change the virtual directory path for the new app, as I have users which have bookmarked different links to the old app.
Lets say I have a user, who has this bookmark:
http://server/webapp/oldpage.aspx?data=somedata
My new app is going to reside in the same virtual directory, replacing the old one, but it has no longer oldpage.aspx, instead it has different layout, but it still needs the parameter from the old url.
So, I have set to redirect 404 errors to redirectfrombookmark.aspx, where I decide how to process the request.
The problem is, that the only parameter I receive is "aspxerrorpath=/webapp/oldpage.aspx", but not the "data" parameter, and I need it to correctly process the request.
Any idea how I can get the full "original" url in the 404 handler?
EDIT: reading the answers, looks like I did not make the question clear enough:
The users have bookmarked many different pages (oldpage1, oldpage2, etc.) and I should handle them equally.
The parameters for each old page are almost the same, and I need a specific ones only.
I want to re-use the "old" virtual directory name for the "new" application.
The search bots, etc., are not a concern, this is internal application with dynamic content, which expires very often.
The question is - can I do this w/o creating a bunch of empty pages in my "new" application with the old names, and Request.Redirect in their OnLoad. I.e. can this be done using the 404 mechanism, or some event handling in Global.asax, etc.
For the purposes of SEO, you should never redirect on a 404 error. A 404 should be a dead-end, with some helpful information of how to locate the page you're looking for, such a site map.
You should be using a 301, moved permanently. This allows the search bots to update their index without losing the page rank assigned to the original page,
See: http://www.webconfs.com/how-to-redirect-a-webpage.php on how to code this type of response.
You could look into the UrlRewritingNet component.
You should also look into using some of the events in your Global.ascx(?extention) file to check for errors and redirect intelligently. The OnError event is what you want to work with. You will have the variables from the request at that point in time (under the HttpContext object) and you can have your code work there instead of a 404. If you go this route, be sure you redirect the 404 correctly for anything other than oldpage.aspx.
I am sorry I don't have any explicit examples or information right now, hopefully this will point you in the right direction.
POST and GET parameters are only available per request. If you already know the name of the old page (OldPage.aspx) why not just add there a custom redirect in it?

Resources