How and where to add a robots.txt file to an ASP.net web application? - asp.net

I am using ASP.net with C#.
To increase the searchability of my site in Google, I have searched & found out that I can do it by using my robots.txt, but I really don't have any idea how to create it and where can I place my tag like ASP.net, C# in my txt file.
Also, please let me know the necessary steps to include it in my application.

robots.txt is a text file in the root folder that sets certain rules for the search robots, mainly which folders to access and what not. You can read more about it here: http://www.robotstxt.org/robotstxt.html

The robots.txt file is placed at the root of your website and is used to control where search spiders are allowed to go, e.g., you may not want them in your /js folder. As usual, wikipedia has a great write up
I think you may find SiteMaps more useful though. This is an XML file which you produce representing the content of your site. You then push this to the main search engines. Although started by Google all the main search engines have now agreed to follow a standard schema.
Increasing your Google score, and SEO in general, isn't something I've know much about. It sounds like a black art to me :) Check out the IIS SEO Toolkit though, it may offer some pointers.

Most search engines will index your site unless a robots.txt tells it not to. In other words, robots.txt is generally used to exclude robots from your site.

Related

How to rewrites URLs in ASP.NET

I have been using Helicon to rewrite my URLs and they are in a file htaccess (no dot). The rewrite goes something like:
RewriteRule /e-commerce /e-commerce.asp [I,U]
I have read a few answers, starting with How to Determine the Installed ASP.NET Version of Host from a Web Page. I ran the page, and it displayed 2.0.50727.3643
A little history so maybe one can be gentle. I was a Microsoft Frontpage MVP, but disliked their Frontpage Server Extensions (FPSE). Some hosting companies are still using them, but the last ones were back in 2002.
I was a Microsoft guy. So I went with Microsoft servers and started using ASP includes. Then I came across Helicon - and used it for 4-5 yrs. Some of my sites are having no issues, but some of them are. And my new prices along with new hardware for credit card processing is out and I really need help (BTW, I looked for an e-commerce section but found nothing if y'all have one, I'll be more than happy to help).
I do not even know what is the file name I should be using and the information that goes in there.
Rename a file in C#
How to rename a file in .NET?
Rewriting URLs in ASP.NET/C#
Custom Url Rewriting in asp.net
I have seen several file names but I do not know which one to use. I am sure there is a question out there that matches mine, but after looking for several hours, I am hoping some of the experts will be able to help me out.
Thank you!
You should give a try to URLRewriter.Net. It's very easy to integrate into asp.net project. Instead of IIS level it implements url rewriting at asp.net level.

Access to sitemap?

i created a site map with the name "Web.sitemap" in the root folder, and i need to feed this to google keywords. Any idea how i can access this file? I tried (domain)/Web.sitemap , but it doesn't load.
What is the proper way to access this file?
Thanks
Web.sitemap is typically used by the Sitemap control in ASP.NET to render menus and what not. It is not exposed publically, and in fact the default IIS configuration will block it from being loaded through the browser.
You may be thinking of a sitemap.xml file, which is an XML description of every page on your site used by search engines and crawlers. More information on this can be obtained from http://www.sitemaps.org/protocol.php
Not sure what you mean by feeding it to "Google keywords"? But if you want to submit a sitemap to Google Webmaster Tools (and search engines in general), it is an XML sitemap following the XML sitemaps protocol you want (as Mike wrote)

How to index a web site

I'm asking on behalf of somebody, so I don't have too many details.
What options are available for indexing site content in an ASP.NET web site? I suspect SQL Server's Full Text index may be used if the page content is stored in the database. How would I index dynamic and static content if that content isn't stored in the DB, but in html and aspx pages themselves?
We purchased Karamasoft Ultimate Search several years ago. It is a search engine add-on for your web site. I like it because it is a simple tool that taught us searching on our site. It is pretty inexpensive and we knew we could buy later if we needed more or different features. We needed something that would give us searching without having to do a lot of programming.
Specifically, this tool is a web crawler. It will run on your web server and it will act like an end-user and navigate through your site keeping a record of your web pages, so when a real users searches, they are told the pages that have the content they want.
Keep that in mind it is acting like an end-user, so your dynamic data is indexed right along with the static stuff because it indexes the final web page. We needed this feature and it is what appealed to us the most.
You can use a web crawler to crawl that site and add the content to a database which then is full text indexed. There are a number of web crawlers out there.
Lucene is a well known open source tool that would help you here. The main branch is Java based but there is a .Net port too.
Main site: http://lucene.apache.org/
.Net port: http://incubator.apache.org/lucene.net/
Having used several alternatives I would be loath to do anything other than Google Site Search.
The only reason I use SQL Full Text Search is to search through multiple columns. It's really hard to implement it in any effective manner.

Why in some dynamic website , their pages are in html format?

I've seen a lot of dynamic website through the internet that their pages are in html or htm format . I don't get it why is that ? And how they do that ?
Just look at this website : http://www.realmadrid.com/cs/Satellite/en/Home.htm
What you see in the URL can be set at will by the people running the web site. The technique is called URL rewriting.
How
On Apache, the most popular solution to that is the mod_rewrite module.
Seeing as you've tagged ASP.NET: As far as I know, ASP.NET has only limited rewriting support out of the box. This blog entry promises a complete URL rewriting solution in ASP 2.0
Why
As for the why, there is no compelling technical reason to do this.
It's just that htm and html are the recognized standard extensions for HTML content, and many (including myself) think they simply look nicer than .php, .php5, .asp, .aspx and so on.
Also, as Adam Pope points out in his answer, this makes it less obvious which server side technology/language is used.
The .html/.htm extension has the additional effect that if you save it to disk, it is usually automatically connected with your installed browser.
Maybe (a very big maybe) there are very stupid simple client programs around that recognize that they have to parse HTML by looking at the extension. But that would be a blatant violation of rules and was hopefully last seen in 1994. Anyway, I don't think this is the case any more.
There are a number of potential reasons, these may include:
They could be trying to hide the technology they built the site with
They could be serving a cached version of a page which was written out to HTML.
They could simply perceive it to look friendlier to the user
They might be using a server-side scripting language like PHP or ASP. You can configure what file extensions get parsed by the language by editing the web server configuration files.
For example in PHP the default extension is .php but you could configure the server to use .html, that would mean any files with the .html extension could contain PHP code they would get parsed before the page is sent to the clients web browser.
This is generally not recommend as it adds an overhead and .html pages that don't have any PHP would be parsed by the PHP engine anyway which is slower then serving pages direct to the browser.
The other way would be to use some form of URL rewriting. See URL Rewriting in ASP.NET
Another reason is SEO(Search engine optimization). Many search engines like html pages and many guys(I mean some SEO specialists) think the html can improve the rank of their content in search engine.
One possibility is just historical reasons. Pages that started static, now are generated dynamically, but sites don't want to break old customer's favorites.
They keep some pages as html because their content is not supposed to change frequently or not at all.
But you should also keep in mind the fact that some sites are dynamic but they change the page extention to html but original page remains same eg php or aspx, etc using htaccess or some frameworks like codeigniter etc.

asp.net help resource location

I have a relatively simple site that I'm working up for an intranet environment. The pages have a hook to display a simple bit of text (possibly with a bit of HTML for markup purposes) for help when the user clicks a link on the page. I'm debating whether to put the help snippets in their own XML file or create a section in web.config. The site is to be deployed across several client sites and given that updating a web.config file appears to restart the site, I'm leaning toward having it in its own file. My question is where would be the best place to locate it? I'd rather it weren't easily web-accessible, so although root or some folder is an option, I'm wondering if there is a more "standard" location for files like this, App_LocalResources perhaps? Any feedback would be welcome. Thanks.
I will look at these options. I don't anticipate a lot of updates to the help file/resource, but I think as the function of the site expands, it's certainly possible. I like the idea of it being something like XML or at least editable in a text editor so that updating doesn't necessarily require VS to update the file. Thanks all!
Sounds like a perfect candidate for resx (resource) files in the App_GlobalResources folder. Those are easily editable and posted to a site without any restarts.
What about using an embedded resource? There are several tutorials around how to use embedded resources and package it up in a dll to distribute along with your website.
In my projects, I connect the web application to an online help wiki wherever possible.
The .aspx page name is used as the help page title. Once you are in the wiki, you are free to do all the wiki tricks, such as redirecting and linking
See my blog entry for technical infos.

Resources