Creating a robots.txt for an ASP.NET MVC site - asp.net

I'm creating a robots.txt file for my website, but looking through my project structure, I'm not sure what to disallow.
Do I need to disallow standard .NET MVC directories and files like /App_Data, /web.config, /Controllers, /Models, /Global.asax? Or will those not be indexed already?
What about directories like /bin and /obj?
If I want to disallow a page, do I disallow /Views/MyPage/Index.cshtml, or /MyPage?
Also, when specifying the sitemap in the robots.txt file, can I use my Web.sitemap, or does it need to be a different xml file?

'robots.txt' refers to paths as they are publically seen from Web crawlers.
There's nothing particularly special about a crawler: it merely uses HTTP to request pages from your site precisely like a user does.
So, given that your MVC site is properly configured, files like /web.config or the paths you mention won't be visible to the outside world as neither IIS nor your application will be configured to serve them. Even if it was pointed to those files the spider would receive a 404 Not Found and continue.
Similarly, your .cshtml or .aspx content files won't be seen with those extensions. Rather, a Web crawler will see precisely what you'll show to users.

Related

How does people make ASP.NET page in URL with html file name?

I seen an ASP.NET application, in the URL is saying:
http://xxxxxxxxx/FILENAME.html?xxxx=xxx
How come it is html file? But not aspx file? How did they do it?
I heard from my manager that's an ASP.NET project he outsourced.
Sometime I seen people with their web page is ended in .html too, but obviously that is generated dynamically...
Files ending with .html are optional. These are static HTML-pages without any code-behind and can be included as part of any web application. They are not parsed and compiled by the server but rather just sent as good old predefined HTML.
You can also configure the web server so that it routes requests with different endings through the ASP.net rendering engine. This way you can keep the widely recognized ending .html and still have dynamic page generation.
The file extension is not necessarily tied to the execution engine. You can make ASP.NET process .aspx, .html, .htm, .bob, .foobar, .css, etc.
There are multiple of ways to do this:
In IIS manager, set the file extension mapping for .html to point to ASP.NET. If you're using MVC, you can handle this via routing.
Use a rewrite engine to map anything with a .htm* extension to .aspx
There are probably other ways, but these are the most direct.
Also, the .html extension doesn't mean that the file was dynamically generated.
You can use URL rewriting. There are a lot of different rewriters most popular being the URL rewrite module ( http://www.iis.net/download/urlrewrite ) and the built in (in ASP.NET 4.0) Routing Engine ( http://msdn.microsoft.com/en-us/library/cc668201.aspx ).
The URL Rewrite module is external to your application and it translates incoming URLs to regular .aspx URLs. You are responsible for generating the links with .html. It is good if you are adding it to an existing application.
The built in routing can generate urls based on routes and is configured in Global.asax (usually) with code.
Right click on the project.
Add new...
pick the HTML file type.
Some people prefer to use a different extension (or even none at all) in order to hide the technology used to develop the site.
Bear in mind that you would have to properly configure IIS to let the .net engine handle the .html file types.

Is default.aspx a .Net equivalent to an "index" file?

I have just started to work for a new company as a web developer, previous research has led me to find out their site is built in asp.net which isn't a problem, I just dont have any experience in this, all my experience is html, css, php and Js.
Upon gaining access via ftp, I noticed there is no traditional index.bla, so I went to the homepage on their website, and in stead of index, it was default.aspx.
Is this "default.aspx" file the .Net replacement / equivalent of an index file, and does it work in the same way?
Yes. In IIS (the web server) you can specify which files will be shown when a directory (like the root, when accessed through http://www.sitename.tld/) is requested.
You can configure which files will be shown and in what order. Like here (IIS 6):
So when a user requests a directory on that site, IIS will search for "Default.htm", if that isn't found it'll look for "Default.asp" and so on. If none of the default documents are found, you will either see the directory's contents (disabled by default) or an error saying you can't see the directory's contents.
In Apache this is set through the DirectoryIndex directive in httpd.conf.
Yes. index is an arbitrary name that Apache defaults to. The index page can be named anything, and with IIS it is usually default.

Custom VirtualPathProvider unable to serve URLs ending with a directory

As part of a CMS, I have created a custom VirtualPathProvider which is designed to serve a single file in place of an actual file structure. I have it set up such that if a file actually exists on the server, that file will be served. If the file does not exist, the virtual content stored for that address will be served instead. This is similar to the concept of serving a website from files stored in a database, though in this case the content is stored in XML files on the server.
This setup works perfectly when a request is made to a specific page. For example, if I ask for "www.mysite.com/foobar.aspx", the content that is stored for "foobar.aspx" will be served. Further, if I ask for "www.mysite.com/subdir/foobar.aspx", the appropriate content will also be served.
The problem is this: If I ask for something like "www.mysite.com/foobar", things begin to fall apart. If the directory exists on disk (and doesn't have a configured default page in IIS, such as index.aspx), I will get a "Directory Listing Denied" error. If the directory does not exist, I'll simply get a 404 - Resource Not Found.
I've tried several things, and so far nothing I've done has made a bit of difference. It seems as though IIS is simply noting the nonexistence of a directory (or default file in an existing directory) and serving up its own error code, without ever asking my application what to do with the request. If it ever did get to the application, I would be able to solve the problem, but as it stands, I'm quite lost. Does anyone know if there is some setting in IIS that is causing this?
I've looked for every resource I can find on the subject, and am coming up empty. I know this should be possible, because I have read tutorials on serving content from both databases and ZIP files. HELP!
p.s., I am running IIS6 and .NET 3.5
IIS will only pass a request to the ASP.NET process if it is configured to do so for the particular extension. The default is aspx, ascx, etc. In other words, if you request a .html file, ASP.NET will never see that HTTP request. Likewise for empty extension.
To change this behavior, add a wildcard mapping to the ASP.NET process. Load IIS Manager, go to the Properties for your web site and look at the Home Directory tab. Click on "Configuration" and there you will see the extension-to-applicaiton mappings.

web.config ignoring certain files from requiring authentication

In my asp.net web application, I have a folder in which I have a few html and jpeg files. some of these files do not need a user to login while the others do. How do I exclude the files that are free for view to be displayed without logging in while still maintaining the user to login for viewing other files in the same folder using just the config file. I wasnt able to find something relevant in the config file or maybe I overlooked it. If anyone knows please reply.
Thanks.
I've tried to answer this as well as I can but the sentence:
How do I exclude the files that are free for view to be displayed without logging in while still maintaining the user to login for viewing other files in the same folder using just the config file.
..is a bit confusing!
The files that need to be authenticated are the ones that are handled by the asp.net handler such as .aspx files. jpegs and other static files bypass this so can be viewed without authentication. The handler aspnet_isapi only handles certain files but you can configure it to handle more file extensions (or all files) by configuring extension mappings in IIS.
Personally, I would put all files I wanted to be unprotected in a folder with permissions to allow anyone to view that folder, set the aspnet_isapi handler to handle all files and then protect your other folders according to your application's needs.
Depending on what you want to do (as your question isn't that clear), you may or may not be able to achieve what you want just from the config file but hopefully this answer will give you the information you need to make your own conclusions on that.

Programmatically deciding what file a URL should point to with ASP.NET 3.5 and IIS 7

Is it possible to programmatically resolve a URL to a file using ASP.NET and IIS? Specifically I'd like the file to be outside of my Virtual Directory (could be anywhere on the local file system). So if a URL comes in like http://mysite/somepicture.jpg I'd like to be able to return c:\mypicture.jpg. I looked into creating an IHttpModule for URL rewriting but that isn't quite what I need - it's limited to URLs within the existing site.
You cannot achieve it by URL rewriting as the file is not hosted on your Web site. You should use Response.WriteFile method in an HttpModule or HttpHandler to manually stream the file to the user.
I would like to add to Mehrdad's response by saying that you need to make sure your app has rights to the folder the files you want live in. That way you can dish it out as Mehrdad suggested.

Resources