Removing a specific page from custom cache output provider in asp.net - asp.net

I'm using asp.net 4.0 (Web Forms website) and I've implemented a custom disk based output provider which works fine in caching the pages on my site. The problem I'm getting however is that I cannot find a way to remove a specific page from the cache and I suspect it's a problem with Url Rewriting.
For example, if I have a page called test.aspx, I can easily remove it from the cache using:
HttpResponse.RemoveOutputCacheItem("/test.aspx");
This does not work for a page that is output cached through id though eg http://www.example.com/page/1 is rewritten as http://www.example.com/myfolder/Page.aspx?id=1
I can see the cached version of the page but I cannot remove it from the disk cache as I need an absolute virtual path for RemoveOutputCacheItem to work. I've tried the following:
HttpResponse.RemoveOutputCacheItem("/myfolder/Page.aspx?id=1");
HttpResponse.RemoveOutputCacheItem("/myfolder/page/1");
And a lot of other variations but nothing seems to work.
The cached key for the url does not match the stored key a2_myfolder_page.aspxHQNidV1FCDE when trying to use RemoveOutputCacheItem.
Is there any solution for that? Or another way to get specific cached pages evicted? Thanks

Have you tried using the VaryByParam or VaryByCustom, and use "id" as your custom identifier: http://msdn.microsoft.com/en-us/library/ms153453.aspx

Related

Single Page ASP application and Routing

I thought this would be very simple but it's proved difficult to find the information I'm looking for online... I have a single page website using ASP and I want to use query strings but I don't want to have to put Default.asp in the URL. So at the moment the URL would be something like:
http://localhost/default.asp?id=4
But I'd like something like:
http://localhost/watch?id=4
I'm not using Visual Studio so was hoping to be able to do this with a simple text editor. I'm not that familiar with ASP either so this I'm very much a newbie with these concepts.
You have 2 options.
1: Use URLRewrite extension from Microsoft that can be found here
You can set-up an inbound rule for the pattern: watch using wildcards, and add conditions that disable the rule when a file or directory actually exists. (URL Rewrite sets this by default), then create your action to point to default.asp and check the Append query string box. Then it will route all /watch url's to your default.asp page and will allow your script to read all querystrings as it does normally.
2: Use a custom 404 error page:
Point your error 404 page to the default.asp page in IIS, but you will need to use Request.ServerVariables("Querystring") to parse out the actual URL requested, since the 404 error page is not able to get your querystring by using Request.Querystring as you will be able to do in option 1. You will have to create your own url reading system, since the data returned as the querystring would be something like ?404;http://www.domain.com/watch?id=12312, instead of ?id=12312 like it would in option 1.
Personally I would use the 1st option, even though it requires to install the extension, it's more flexible and doesn't fill up the logs with 2 seperate requests like option 2 does, and its just more transparent to your script rather than creating your own url reading system as you would have to using option 2.

URL Rewriting for a Subsite?

So this is what I'm trying to achieve...
I have a site with the URL with the following format:
somedb.mysite.com
somedb can be many different DBs. The problem is, different DBs require different versions of the site. These different versions are setup as subsites of the parent site.
For example somedb.mysite.com/1.0 and somedb2.mysite.com/2.0
Currently I'm using Response.Redirect() in the parent site to redirect to the proper version. What I'm HOPING to do is to HIDE the version number so that all DBs appear to be using the same site, and so the URL appears the same if a DB is updated to a newer version.
I've been messing around with using RewritePath and Server.Transfer with not a lot of success... The main problem (from what I can gather) is that Rewrite and Transfer only work within the same site, but the individual versions are technically different subsites sites.
Does anybody have any ideas how I may be able to achieve what I'm trying to do?
I would suggest you write your own handler to intercept incoming request for you subdomains
To achieve this you can implement the IHttpHandler and process the url redirection logic in t he ProcessRequest method
Here is a very good example http://www.codeproject.com/Articles/30907/The-Two-Interceptors-HttpModule-and-HttpHandlers

Updateable Google Sitemap for ASP.NET 3.5 Web App Project

I am working on an ASP.NET 3.5 Web Application project in C#. I have manually added a Google-friendly sitemap which includes entries for every page in the project - this is not a CMS.
<url>
<loc>http://www.mysite.com/events.aspx</loc>
<lastmod>2009-11-17T20:45:46Z</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
The client updates events using an admin back-end. Other than that, the site is relatively static. I'm trying to decide on the best way to update the <lastmod> values for a handful of pages that are regularly updated.
In particular, I am using the QueryStringField of the ListView control to enhance SEO as described here:
https://web.archive.org/web/20211029044137/https://www.4guysfromrolla.com/articles/010610-1.aspx
http://gsej.wordpress.com/2009/05/31/using-a-datapager-with-both-a-querystringfield-and-renderdisabledbuttonsaslabels/
When the QueryStringField property is set, the DataPager renders the paging interface as a series of hyperlinks which the crawler can follow and index. However, if Google has crawled my list of events two days ago, and in the meantime, the admin has added another dozen events... say the page size is set to 6; in this case, the Google SERP links would now be pointing to the wrong pages. This is why I need to be sure that the sitemap reflects changes to the events page as soon as they happen.
I have already looked though other SO questions for info and didn't find what I needed. Can anyone offer some guidance or an alternative approach?
UPDATE:
Since this is a shared hosting environment, a directory watcher/service won't work:
How to create file watcher in shared webhosting environment
UPDATE:
Starting to realize that I may need signify to Google that the containing page has been updated; update the last-modified HTTP header?
Rather than using a hand-coded sitemap, create a sitemap handler that will generate the sitemap on the fly. You can create a method in the handler that will grab pages from an existing navigation sitemap, from the database, or even from a hard-coded list of pages. You can create an XmlDocument from the list, and write the InnerXml of the document out to the handler response stream.
Then, create a class with a method that will automatically ping search engines with the above handler's URL (like http://www.google.com/webmasters/tools/ping?sitemap=http://www.mysite.com/sitemap.ashx).
Whever someone adds a new event, call the above method. This will ping Google using your latest sitemap (freshly generated by the above method).
You want to make sure that the ping only works if the sitemap has actually been updated. You could use File.SetLastWriteTime on events.aspx in the AddNewEvent handler to signify that the containing page has been updated.
Aslo, be careful to make sure there have been no pings for the last hour (as Google guidelines discourage pinging more than once per hour).
I actually plan to implement this in the following OSS project: http://cyclemania.codeplex.com. I will let you know once it's done and you can have a look.
If you let your user add events to the website you are probably using a database.
This means you can generate the XML-Sitemap at runtime like this:
create a page where your sitemap will be available (this doesn't need to be sitemap.xml but can also be sitemap.aspx or even sitemap.ashx).
open a database connection
loop through all records and create an Xml Element for each record
This blog post should help you further: Build a Search Engine SiteMap in C#.
It is not using the new XElements from .Net 3.5, but is will work fine.
You can put this in an aspx page, but adding an HttpHandler is probably better as described on the same blog, different post: (creating a httphandler for a sitemap)

ASP.Net Context.User.Identity weirdness

I have an ASP.Net 3.0 SP1 app that uses Form Authentication.
While testing, I noticed that if I viewed a page that another user was viewing, the other users name would be displayed in the control on my master page. The Context.User.Identity is also for the other user.
If I switch to different page that no one else is viewing the Context.User.Identity is correct.
I stumped and would appreciate suggestions.
Thanks in advance.
Chris
Maybe because output caching is enabled for the page: if the page is cached server-side with VaryByParam=none, all users will get the same copy from the cache.
I can only think of two things that can cause this:
You're storing user-specific data in a place shared between requests (e.g. in a static(C#)/shared(VB) variable, in the ASP.NET Cache, in the Application object, ...)
You have output caching enabled.
Check for:
OutputCache directives in your aspx and ascx files,
system.web/caching element in your web.config file(s),
Calls to the HttpCacheability.SetCacheability method.
If you can't find the problem:
Try creating a simplified version of your application until you get the simplest possible version that still reproduces the undesirable behaviour.
During this process of simplification you'll likely discover the problem for yourself. If not, post some code from the simplified version.
Make sure you are not using a link that comes with the authentication ticket when using a cookieless browser.
Also make sure to review any other that might be sharing the data among requests. Just like DOK said, but remember Application isn't the only way you could be doing that.
It looks like the issue was caused because I setting targetframe="_self" or Target="_self". I removed all these and everything seem to be working fine.
One other note: If I were to refresh the page it would also display the page with the correct user.

Where content based websites store their content?

Sites like cnn.com or foxnews.com.
Where do they store all the articles? In html files? In database?
More logically to store everything in DB but how to generate a static link to something that is inside DB?
It's not that they have a a dynamic page load like: LoadArticle.aspx?ArticleID=123, every article has it's own address.
Please explain how this is done.
They use a special content management library called VoodooLib.dll.
Seriously, when you write something to a database, you normally generate some kind of unique identifier - 123, for example. It gets permanently associated with that record (article content). After that it is used to generate the same id as part of an Url at any time later.
As for the static link, it is a simple matter of Url Rewriting.
You generate static links to display on a page because they work much better for SEO. When a request for that static Url hits the server, it gets substituted for something "server friendly" and then gets to be processed.
They probably use some form of Content Management System (CMS). There are many different ones out there - most store the actual content in a database or as XML (some store XML in a database). They will the either publish that content as static HTML pages or, more commonly now, as dynamic pages that are cached. Many use what are known as "friendly URLs" that are virtual addresses that are mapped to the actual physical file path using URL-rewriting techniques.
Note you can't tell whether a page is dynamic or static simply from the extension. It is quite possible to have dynamic pages that end in the .html extension.
Just because the URL looks "static" doesn't mean it is; they could be using something like mod_rewrite or an IIS ISAPI to make the URLs more search engine friendly.
For the high-volume news sites that you mention, however, they may very well generate the pages statically in order to prevent overloading the database with repeated requests for the same article.
Look at the URl of this page, it doesn't have xxx.aspx?some-query-string
You are refering to using friendly URLs.
To do something like that, one common way is to use URL Rewrite and/or some custom HTTPModule
Here's a good reference: http://weblogs.asp.net/scottgu/archive/2007/02/26/tip-trick-url-rewriting-with-asp-net.aspx
Just because a page has a normal URL does not mean that it isn't serving dynamic content. With the Apache mod_rewrite module, it is possible to manipulate URLs. So, for example, a page like http://www.domain.tld/permalink/12345/message-title-slug can be converted internally to http://www.domain.tld/permalink/index.php?id=12345&slug=message-title-slug.
I do not know exactly what cnn.com and foxnews.com use, but I would bet that they use a Content Management System (CMS) which serves all pages dynamically, with the content stored either in a database or on the filesystem, and with authoring/publishing all being performed through the particular CMS.
Just checking cnn.com, the article links have in them
Year
Location (US or WORLD/specificlocationid)
Month
Day
Article name.
All of this information together can be used to uniquely identify any article (even less of it is probably actually needed). The dynamic content loading page address could easily be hidden by some method of URL rewriting, and then the information in the requested URL is used to determine which article in the DB is to be served up.
I don't know why all the other answerers seem to assume that some form of URL rewriting is necessary to create friendly URLs. It's not true at all.
It's perfectly possible to write web serving code that splits a URL into parameters - eg year, month, title - and pass that directly to the code that gets the content from the database, without any need to rewrite the URL. Most modern web frameworks such as Django and Rails include this functionality out of the box.
This is done through mod-rewrite techniques.
Here's an article about the mod rewriting engine: http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html
And here's their "guide": http://httpd.apache.org/docs/2.0/misc/rewriteguide.html
I hope that helps. It should make for a good starting point. Goodluck.

Resources