Create Google compliant dynamic XML sitemap - asp.net

I want to create a dynamic (fetching data from the database) XML sitemap which I can submit to Google webmaster tools.
Surprisingly, I couldn't find any recent controls/code online to do this. The most recent code I found was this http://weblogs.asp.net/bleroy/archive/2005/12/02/432188.aspx which is for ASP.Net 2.0. I don't mind using this, but I suspect it's outdated.
Can somebody please point me in the direction of code which accomplishes this?

A couple of options include:
The ASP.Net SiteMap infrastructure. It allows you to write a custom sitemap provider like this one, which uses Micosoft Access, to generate a sitemap.
You can also find a very simple sitemap generator project on this site.
Another option (and fun learning experience) is to write your own by just looking at the sitemap protocol, and using Linq To SQL along with Linq To Xml to generate the format. Here is an example uses Linq To SQL and Linq To XML to generate XML.
Finally, Google also accepts RSS/Atom feeds, so you could generate one of those instead. If you go this route then you can make use of the SyndicationFeed class. There are also a couple open source options available.

Actually i just done it recently using LinqToXMl
How to generate xsi:schemalocation attribute correctly when generating a dynamic sitemap.xml with LINQ to XML?
Actually the string that is returned by that code is written directly to the Response object. I use a .ashx HttpHandler to deliver the content as XML and using Routing to serve it under the name of sitemap.xml. Also you should put it on your robots.txt file

Related

parse specific website data and store them in db table

i want to parse a table row(with current name eg.test) from site that requires user/pass and store this in a database table.
Is this possible to be done in asp.net (or asp.net mvc4)?
*i have the username/password
*the site login form is : http://exat.ru/toursearch/
Thanks ,
I think you are talking about web scraping, and ASP.net might not be the best fit for what you are trying. There are a number of web scraping frameworks out there, e.g.
http://scrapy.org/ for python
or
http://spyderwebtech.wordpress.com/2008/08/07/scraping-websites-with-curl/ using CURL
You can have a look at the 'HttpWebRequest' which can get the site data for you. Although you may have to parse it using a custom solution

SiteMap for the search engines

I have to make an api call and get the response. This response contains more than 4000 urls.
I have to list all these urls in the sitemap for the search engines to crawl easily. I have to write a handler for doing this task. Can someone suggest me an example for doing this.
I'll assume you are talking about a sitemap in XML format, but you didn't specify what the source is besides that you are to do an API call. However, the 3rd or so result from a Google search on "asp.net google sitemap" should give you a perfect starting point:
http://www.mikesdotnetting.com/Article/94/Create-a-Google-Site-Map-with-ASP.NET
I would suggest creating an ASHX handler (File -> New -> Generic Handler in Visual Studio) instead of a page like they do in the example.
Upload the handler to the website and add the sitemap to e.g. Google by using their Webmaster Tools.
A quick search on Google resulted in this link:
XML sitemap with ASP.NET
Which should get you most of the way with the handler and composing the XML.

What different customizations are possible by using HttpHandlers in an ASP.NET application?

Digging deeper into HttpHandlers I found they provide nice way to customize an ASP.NET application. I am new to ASP.NET and I want to know about different customizations that are possible using HttpHandlers. Lots of websites talk about how they are implemented but it would be nice to know some use cases beyond what ASP.NET already provides using HttpHandlers.
An ASPX page provides a base template (so to speak) for a form-based web page. By default, it outputs text/html and allows for easy adding of form elements and event handling for these elements.
In contrast, an HttpHandler is stripped to the bone. It is like a blank slate for HTTP requests. Therefore, an HttpHandler is good for many types of requests that do not necessarily require a web form. You could use an HttpHandler to output dynamic images, JSON, or many other MIME type results.
A couple examples:
1) You have a page which needs to make an AJAX call which will return a JSON response. An HttpHandler could be setup to handle this request and output the JSON.
2) You have a page which links to PDF documents that are stored as binary blobs in a database. An HttpHandler could be setup to handle this request and output the binary blob as a byte stream with a PDF MIME type for the content type.
Check this page for a good example and code of why you might want to customize them: http://dotnetslackers.com/articles/aspnet/Range-Specific-Requests-in-ASP-NET.aspx Essentially it can be used when you want to server certain files but not allow them to be accessible via a plain url (security).

Google Sitemap HttpHandler cacheing

I have a HttpHandler that generates a Google sitemap based on my asp.net web.sitemap. Fairly standard stuff. Except that it does some fairly heavy database work to auto-generate additional urls for Ajax tabs within pages.
All this means our DB gets hit fairly heavily if the bot hits sitemap.axd.
What we need, of course, is output caching. But how do you go about caching inside something that basically writes directly to a XmlTextWriter?
The simplest answer is to write the XML to a string and store it in a static field.

Updateable Google Sitemap for ASP.NET 3.5 Web App Project

I am working on an ASP.NET 3.5 Web Application project in C#. I have manually added a Google-friendly sitemap which includes entries for every page in the project - this is not a CMS.
<url>
<loc>http://www.mysite.com/events.aspx</loc>
<lastmod>2009-11-17T20:45:46Z</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
The client updates events using an admin back-end. Other than that, the site is relatively static. I'm trying to decide on the best way to update the <lastmod> values for a handful of pages that are regularly updated.
In particular, I am using the QueryStringField of the ListView control to enhance SEO as described here:
https://web.archive.org/web/20211029044137/https://www.4guysfromrolla.com/articles/010610-1.aspx
http://gsej.wordpress.com/2009/05/31/using-a-datapager-with-both-a-querystringfield-and-renderdisabledbuttonsaslabels/
When the QueryStringField property is set, the DataPager renders the paging interface as a series of hyperlinks which the crawler can follow and index. However, if Google has crawled my list of events two days ago, and in the meantime, the admin has added another dozen events... say the page size is set to 6; in this case, the Google SERP links would now be pointing to the wrong pages. This is why I need to be sure that the sitemap reflects changes to the events page as soon as they happen.
I have already looked though other SO questions for info and didn't find what I needed. Can anyone offer some guidance or an alternative approach?
UPDATE:
Since this is a shared hosting environment, a directory watcher/service won't work:
How to create file watcher in shared webhosting environment
UPDATE:
Starting to realize that I may need signify to Google that the containing page has been updated; update the last-modified HTTP header?
Rather than using a hand-coded sitemap, create a sitemap handler that will generate the sitemap on the fly. You can create a method in the handler that will grab pages from an existing navigation sitemap, from the database, or even from a hard-coded list of pages. You can create an XmlDocument from the list, and write the InnerXml of the document out to the handler response stream.
Then, create a class with a method that will automatically ping search engines with the above handler's URL (like http://www.google.com/webmasters/tools/ping?sitemap=http://www.mysite.com/sitemap.ashx).
Whever someone adds a new event, call the above method. This will ping Google using your latest sitemap (freshly generated by the above method).
You want to make sure that the ping only works if the sitemap has actually been updated. You could use File.SetLastWriteTime on events.aspx in the AddNewEvent handler to signify that the containing page has been updated.
Aslo, be careful to make sure there have been no pings for the last hour (as Google guidelines discourage pinging more than once per hour).
I actually plan to implement this in the following OSS project: http://cyclemania.codeplex.com. I will let you know once it's done and you can have a look.
If you let your user add events to the website you are probably using a database.
This means you can generate the XML-Sitemap at runtime like this:
create a page where your sitemap will be available (this doesn't need to be sitemap.xml but can also be sitemap.aspx or even sitemap.ashx).
open a database connection
loop through all records and create an Xml Element for each record
This blog post should help you further: Build a Search Engine SiteMap in C#.
It is not using the new XElements from .Net 3.5, but is will work fine.
You can put this in an aspx page, but adding an HttpHandler is probably better as described on the same blog, different post: (creating a httphandler for a sitemap)

Resources