Considerations for changing a page's URL in a CMS - asp.net

I have written a CMS for a website. You can create pages and do all things you would expect but I am just wanting your opinions on what to do if a user changes the URL of a page. You would need to do a 301 for the previous stored URL but if the user changes the URL 10 times you have to account for all those changes.
Therefore do you not allow users to change the URL or are there other approaches?
Thanks

I'd imagine that a user renaming a page isn't going to happen very often, so you might be able to afford to run a scan through all of the pages in your database looking for references to the previous URI. Present a warning page to the user, saying "All of these pages have links that will now go to 404 because of this change", and give them options:
Establish a 301 as you're thinking
Automagically update the identified links
Don't rename the page
Don't make any of the changes
Of course, you could always just perform the automatic update and let the user back that out too, but that necessitates a fairly complex WAL set up that I can tell you from experience is a huge pain.
Just my $0.02!

If you are worried about the 10 sequential requests caused by the 301's, you could have a script periodically going through all "redirect pages", figure out the most recent URL they now point to, and point them straight there without intermediate redirects.
Alternatively, keep a list of the original URLs along with the lastest one, so you can update all of them when the URL changes again.

Quite a lot of CMS's simply just don't allow the permalinks to be changed so if your worried about not complying with some rule you wouldn't be the first to not allow it.
If you do however implement changing the permalinks (say by changing the title) you'll have to store N titles for each content item and just redirect to the title with the highest id.
ContentItemTitle
id - auto
increment text - unique constraint
contentId - reference
This is linked back to ContentItem as a 1 to 1 relationship.
Then when you recieve a HTTP request for <text> just lookup all the ContentItemTitle rows that share the same <contentId> as <text> and pick the one one with the highest <id> and redirect to that.

Related

Symfony - Restrict QueryParams

Is there way to force Symfony throw 404 if there is some extra params ?
For example, I have route /news/ and I want to allow only date parameter. So link could exist in this form: /news/?date=243242, but I want 404 if user enters following link: /news/?param=2 ?
Thanks.
(I don't want to check query params in controller, I know I can)
Do you really need these to be get params? You meet your objective buy having them as values in the URL itself e.g.
#Route("/news/date/{date}")
Slightly different I know - but you can enforce it
Why do you care about the extra params anyway? If some nasty user decides to play with the URL directly, your app is not supposed to behave correctly.
Don't bother with all these checks — unless those params somehow affect security.
Based on your comment, you want to respond with 404 to get rid of duplicate content in Google. There are several steps you need to take to solve that problem.
If a user enter an extra parameter manually, in no way that would add a page to the Google index. So, if you're having duplicate pages based on different params in the Google index, it means that you have links with those extra params on your site. That's how they end up being indexed.
First thing you could do would be to get rid of those links. Then you could go to the Google Webmaster Tools and manually remove the indexed pages with those extra parameters from the index. If you don't have the problematic links anymore, they won't get to the index again.
If for some reason you can't get rid of those link, go to the Webmaster Tools and consult the URL Parameters section to understand how to add parameters that Google should ignore.

Abusing HTTP POST

Currently reading Bloch's Effective Java (2nd Edition) and he makes a point to state, in bold, that overusing POSTs in web applications is inherently bad. Unfortunately, he doesn't specify why.
This startled me, because when I do any web development, all I ever use are POSTs! I have always steered clear of GETs for security reasons and because it felt more professional (long, unsightly URLs always bother me for some reason).
Are there performance differentials between GET and POST? Can anyone elaborate on why overusing POSTs is bad, and why? My understanding - and preliminary searches - seem to all indicate that these two are handles very similarly by the web server. Thanks in advance!
You should use HTTP as it's supposed to be used.
GET should be used for idempotent, read queries (i.e. view an item, search for a product, etc.).
POST should be used for create, delete or update requests (i.e. delete an item, update a profile, etc.)
GET allows refreshing the page, bookmark it, send the URL to someone. POST doesn't allow that. A useful pattern is post/redirect/get (AKA redirect after post).
Note that, except for long search forms, GET URLs should be short. They should usually look like http://www.foo.com/app/product/view?productId=1245, or even http://www.foo.com/app/product/view/1245
You should almost always use GET when requesting content. Only use POST when you are either:
Transmitting sensitive information which should not appear in the URL bar, or
Changing the state on the server (adding/changing/deleting stuff, altough recently some web applications use POST to change, PUT to add and DELETE to delete.)
Here's the difference: If you want to give the link to the page to a friend, or save it somewhere, or even only add it to your bookmarks, you need the full URL of the page. Just like your address bar should say http://stackoverflow.com/questions/7810876/abusing-http-post at the moment. You can Ctrl-C that. You can save that. Enter that link again, you're back at this page.
Now when you use any action other than GET, there is simply no URL to copy. It's like your browser would say you are at http://stackoverflow.com/question. You can't copy that. You can't bookmark that. Besides, if you would try to reload this page, your browser would ask you whether you want to send the data again, which is rather confusing for the non-tech-savy users of your page. And annoying for the entire rest.
However, you should use POST/PUT when transferring data. URL's can only be so long. You can't transmit an entire blog post in an URL. Also, if you reload such a page, You'll almost certainly double-post, because the above described message does not appear.
GET and POST are very different. Choose the right one for the job.
If you are using POST for security reasons, I might drop a mention of other security factors here. You need to ensure that you send the data from a form submit in encrypted form even if you are using POST.
As for the difference between GET and POST, it is as simple as GET is used to send a GET request. So, you would want to get data from a page and act upon it and that is the end of everything.
POST on the other hand, is used to POST data to the application. I am talking about transactions here (complete create, update or delete operations).
If you have a sensitive application that takes, say and ID to delete a user. You would not want to use GET for it because in that case, a witty user may raise mayhem simply changing the ID at the end of the URL and deleting all random uses.
POST allows more data and can be hacked to send streams of files as well. GET has a limited size though.
There is hardly any tradeoff in using GET or POST.

How to use ASP.NET Routing in a Quote of the Day Website

Good Afternoon,
A client is interested in creating an ASP.NET 2.0 website whose purpose is to serve up a "quote of the day". He wants the quotes on static content pages all attached to the same master page. The quote pages must be viewed in a certain sequence, and site browsers cannot view any other pages than the starting page when browsing to the site. That is, everyone must go to page 001.aspx when entering the site.
Two Questions:
1. The content pages are going to be created by the client using an excel data source and a merge process by which each quote page is created eg. 001.aspx, 002.aspx etc. This seems clunky to me at best. Would ASP.NET Dynamic Data be a better solution here?
I'm new to ASP.NET Routing and URL Rewriting as a whole. How would I setup a route table to ensure that users always entered the site on the same entry page, and create a route table such that default.aspx resolves to 001.aspx?
Thanks,
Sid
I would suggest to use the excel sheet as a data source and handle viewing the 'Quote pages' by paging through the result set obtained from said data source.
If your client is concerned about SEO, he must recognize that his requirement to have only one entry page defeats his One-Quote-One-Page-Is-SEO-friendly.
I don't think the effort to distinguish between a human user and a search bot is worth it.
Anyway googlebot is capable of indexing pages with URL parameters thus allowing to be SEO friendly without generating static content (other bots should be as well).
Possible solution
To allow search bots to index your Quotes you have a query parameter for the date of the Quote.
If you want to enforce human users (hackers don't count ;-)) to enter the site only by the current date you check the browser string and redirect any browser not being know as a search bot to the current date if the referer is not equal to the previous date.
This solution should give you a reasonable result without too much overhead.

Standard way to persist data between requests in ASP.NET-MVC

What is the most standard or best way to persist data between requests?
Should I use cookies or session variables? I'm interested in keeping data like sort order, sort column, and page number (for paginiation).
I'm coming from a webforms background so normally this type of thing was automatically handled for me in the viewstate of the controls I was using.
update
I like the querystring idea, for searching and more meaningful URLs; however, I'm working on an "index/list" view, which consists of a View with header, and "control" options, like DDLs for filtering and a partial view that renders the table of data.
The DDLs use a $.load() to call an ActionResult on the controller, which returns the partial view, passing parameters there in the querystring, but since these are ajax requests the main page url of the user's browser does not get updated.
Is there a best-practice for taking querystrings off the main-page URL and using them in ajax requests to other ActionResults?
If you want it to survive only through one request/redirect TempData is your friend.
However, for things like your pagination, URL is the best method, for the ability to share links alone.
A standard way is to pass those sort of things via URL Query Parameters. You can modify your routing to expect certain URL variables. That way the pages become more search engine friendly as well.
It depends on how permanent you want the information to be:
Things like the page number should indeed be in the URL (as others have pointed out) - this helps with bookmarking, etc, but remember that if you add more content to the list, then that bookmarked result set will not always be what the user wanted...
If you're happy for these values to be lost when a session times out (by default around 20 minutes), then put them in Session.
If you think that sessions are going to timeout before the next request, or you want to save it across visits then you should be storing them in either cookies, or a profile (potentially allowing "Anonymous" profiles, which work with the users cookies, so they would lose them across machines).
Personally, I'd think very carefully about putting sort order and columns in the URL if you do you could actually end up really confusing search engines:
Lots of pages with very similar content (page 1, sorted by date desc, page 1 sorted by date asc, etc) - search engines don't like duplicate content, and nor should you as Google (for instance) will only show two pages from your site in a default result set, you want them to be valid, not duplicates.
Search engines will spend lots more time crawling your site, and potentially give up - If on every page they find links to "Sort by this column", they will attempt to follow them, resulting in more work on the server, higher bandwidth use, etc.
These can be mitigated through the use of a Robots.txt file denying access to sorted versions of the page, but if this is generated almost dynamically that will be very complex to maintain going forward.
In response to your update, a nice way to achieve that for pages would be to have links to "Previous" and "Next" pages of results (or better yet, a list of all pages in the list), output on the page, with the page numbers, that you then hide with JavaScript.
This way users should see your nice, AJAXy behaviour, and search engines (and users without JavaScript - mobile, or those using older screen readers for instance) will still be able to get access to all your pages - this will help your pages to degrade gracefully, or use "Progressive Enhancement".
Things that were previously in viewstate should probably be put back in the clients hands via either hidden fields or cookies.
Session is "too" easy. In a dev environment it works great, pretty much no matter what you put in it. In production scalability and persistence become a problem. In-process session is likely to disappear unexpectedly if you have crashing bug in your site, and requires server affinity when load balancing. Out-of process session fixes the durability and affinity issues, but can still be a performance bottle neck if too much stuff is put in session. A VERY common problem is that each page will put 1 or 2 items into session but never take them out again when they are done. And even if a page removes it session data when it is no longer needed, the data can still get orphaned if a user starts a process and never completes it.
Cookies is a fast and simple way to persist data between requests, and you can also make them live only for a limited time depending on your needs.
Session are easiest.

Smart way to disallow users going to a site page directly

A site has 100's of pages, following a certain sitemap. A user can navigate to page2.aspx from page1.aspx. But if the user goes to page2.aspx directly say through a book marked URL, the user should be redirected to page1.aspx.
Edit: I dont want to go in and add code to every page that needs to fulfill this need.
Note: This is not a cross-page postback scenario.
You might consider something that is based off WorkFlow, such as this: http://blogs.msdn.com/mwinkle/archive/2007/06/07/introducing-the-pageflow-sample.aspx
The WCSF team also included a pageflow application block that you can use as a standalone add-on to your application.
I guess you could check the referrer, and if there isn't one / or it isn't page1.aspx then you could redirect back to page1.aspx.
As another answerer mentioned, you could use the Referrer header, but that can be faked by the client.
Since you don't want to modify each page, you could do something with an IHttpModule. Assuming you have some way of describing the valid page navigations, you could do something like this in the BeginRequest handler:
Check the session for a list of valid pages (using a default list for first visit if none are in the session).
If this request is for an invalid page, redirect to the place the user should be.
Based on this request, set up the list of valid pages and redirect page in the session so it's ready for the next request.
I recently worked with real code that checked to see if referrer was blank and used that as a step in authorization. The idea was users wouldn't be able to fake a referrer, you don't need a custom browser to fake a referrer. Users can book mark your page to delicious, then delicious.com is the referrer (and not blank).
I've had real arguments about how sophisticated a user needs to be to do certain hacks-- i.e. if users don't know how to set the referrer, then you can trust it. While true, it's unlikely your users will write a custom browser, but there already are Firefox addons to set headers, referrers etc and they're easy to use.
Josh has the best answer-- on page2 you should check the page hit log and see if the user has recently visted page1
I like alot of the answers above (specifically the workflow).
Another option, is creating each page as a usercontrol and having page1.aspx control what usercontrol gets loaded. This has the advantage of storing your workflow in a single place instead of on each page.
However, I don't think there's a magic bullet out there. It sounds like this security problem is an afterthought, or possibly reported as a bug, and you have been tasked with fixing it quickly and efficiently.
I would start weighing the answers here with their associated cost in hours.. I suspect the quickest solution will be to check referrer addresses on each page. Although hackable, it is obscure and if that risk is acceptable to you it may be the appropriate solution.

Resources