What is the default file to be visited behind a website? - http

When I open a website's url such as www.stackoverflow.com via curl, which file is actually being visited in the server? I know usually it is index.html. But I cannot find such a convention in the RFC2616 document. How can I know it?
BR

The document dilivered by calling a website without a path in the URL is configured by the webserver. So you have no standard there. Is a users joice.
Curl will download the file the webserver is delivering him, or follow the redirect (if -L option is given) when webserver responses a redirect.

There is no way for the client to know how the data for the HTTP response was generated. It might not even be related to a specific file.
The last time I wrote a significant bit of server side code, everything outside of /static/ was routed (via mod_rewrite) though a FastCGI program that got its data from a few different controller libraries, a dozen database schema libraries, a database and a dozen template files.
The WWW is built on links between URLs, not files. Don't worry about files if you are writing client code.

It's not necessarily index.html, and you can't actually know that it could be anything depending on the Server Configuration, for instance in Apache you can change the directory index to the one that suits you
DirectoryIndex home.php
in this case the default file accessed is home.php
in IIS you can take a look about default index and how to change it
but the defaults are
in Apache
index.php (usually: depending on the server configuration)
index.html (is the default that comes with a fresh install)
in IIS
Default.htm
Default.asp
Index.htm
Index.html
Iisstart.htm

Related

Web root option in Varnish Cache similar to `root` in nginx

In nginx we have root option to serve files from a specific directory, eg: root /var/www/data/ in nginx conf, if my url is https://mydom.com/$file_name, nginx will look for files present in /var/www/data/$file_name and return the file if present otherwise return 404.
Now, I want something similar option in Varnish. Is there a way where I can serve files from a specific directory? How can I tell varnish to look for files in a specific directory and return that file?
Varnish is a cache, not a webserver. Varnish doesn't serve pages from a document root on the disk, but it caches responses that came from a pre-defined backend server.
Although Varnish and Nginx have some similarities, and cover some of the same use cases, they are entirely different products.
However, if you use Nginx as a reverse proxy, instead of a webserver, it won't use the root option either.
There is one way you can make Varnish act like a webserver, and that is by leveraging the file module in Varnish Enterprise. This allows Varnish to serve files from disk, but this is not available in the open source version of Varnish, only in the commercial version.

nginx - Completely case-insensitive URL matching and file lookup

I want all URLs on my server to be case-insensitive, in both directions. With that I mean: If the user requests index.html, but the file is called Index.html, they should still get it. If they request Index.html, but it's called index.html, they should still get it.
My server runs on Linux whose file system is case-sensitive by default, but can this be worked around by nginx?

How google searches show result of wordpress posts?

As we know wordpress stores its post and pages in database not in a physical page then how it is possible by google to show result from the postswhich doesnot exist physically.
Also if we do the same will it work or not?
Please do make me clear.
Short-answer:
A web-address is ultimately just a string (a piece of text) given to a web-server which it can interpret and act-on any way it likes.
It can simply map that string to a file-system path and see if it matches a file on disk and return that file to the website visitor.
But it can also instead use that string to do something completely different - such as looking it up in a database and then returning database content.
A web-server is not just (old-school) servers like Apache and IIS that default to serving filesystem content - but it also includes server-side programs like PHP scripts, Node.js applications, and so on.
Step-by-step explanation:
A website visitor (human's web-browser, search engine spider, a bot, etc) requests GET http://example.wordpress.com/2019/10/12/lorem-ipsum
The TCP packet with the request reaches physical computers owned or operated by Wordpress.com.
(This answer will ignore complications like network-load-balancing, application-level routing, HTTP reverse-proxies, and so on.)
The physical computer's operating system routes that network packet to the "outer" webserver software, this is like Apache or nginx.
Apache or nginx only looks at the GET and 2019/10/12/lorem-ipsum part of the request, these are the Method and Path components of the request respectively.
If the "outer" web-server is configured to map the website's root with some filesystem directory, then it will look to see if (by default) /var/www (the default root for Apache on Linux) or C:\inetpub\wwwroot (the default root for IIS on Windows) contains a file named lorem-ipsum exists in /var/www/2019/10/12 (or C:\inetpub\wwwroot\2019\10\12\lorem-ipsum on Windows).
But Wordpress.com does not do this.
Most modern web-applications built today also do not do this, because exposing raw files directly to the internet generally isn't a good idea (but it's still okay for "static file websites", of course).
Instead, WordPress.com is specifically configured to pass the entire 2019/10/12/lorem-ipsum string into php.exe along with the path to the entrypoint script file of WordPress.com's namesake PHP web-application.
This is actually just WordPress' index.php file - however remember that it is invoked using a variety of special techniques (using PATH_INFO) which is why you don't see index.php inside URIs like example.wordpress.com/index.php/2019/10/12/lorem-ipsum.
And then php.exe runs through the script, which then looks-up 2019/10/12/lorem-ipsum in a database, then retrieves the content, builds and renders the page as HTML and then returns the rendered HTML to the website visitor from step 1.
Better answer:
Think more abstractly and challenge your assumptions:
URLs do not point to a web-pages. They actually point to "resources".
A "resource" is not necessarily a web-page. A resource is a representation of some "thing".
A web-page, too, can itself be a representation of some "thing".
A web-page is not necessarily a static HTML file on-disk.
A web-page can be generated dynamically on-the-fly by server-side software like PHP, NodeJS ASP.NET, Java/JSP/Servlets, CGI (archaic), and so on.
In WordPress' case:
The URL http://example.wordpress.com/2019/10/12/lorem-ipsum points to a WordPress article.
But a WordPress article can be represented in different ways - such as a HTML web-page (as in this example), but it could also be represented as a JSON blob or XML blob (for consumption by other computer programs).
That WordPress article could also be represented as a part of another resource, such as a link when you request all articles published in October 2019 (by getting http://example.wordpress.com/2019/10).

NGINX Rewrite Rule without access to the configuration file

In apache, the rewrite rule can be written in the configuration file or in .htaccess file. How about in nginx? Can I use url rewriting without access to the configuration file?
Unfortunately, you can't. This is one of the reasons shared hostings typically use apache or litespeed, not nginx or lighttpd.
A (very ugly) workaround would be to handle all requests with a script which would contain the rewrite rules and would serve the file/script according to the request URI (and which could be modified by a user without having root privileges). However you'd have a bad performance serving static files and you'd need to handle all the request headers by this script, which is not very practical.

Is default.aspx a .Net equivalent to an "index" file?

I have just started to work for a new company as a web developer, previous research has led me to find out their site is built in asp.net which isn't a problem, I just dont have any experience in this, all my experience is html, css, php and Js.
Upon gaining access via ftp, I noticed there is no traditional index.bla, so I went to the homepage on their website, and in stead of index, it was default.aspx.
Is this "default.aspx" file the .Net replacement / equivalent of an index file, and does it work in the same way?
Yes. In IIS (the web server) you can specify which files will be shown when a directory (like the root, when accessed through http://www.sitename.tld/) is requested.
You can configure which files will be shown and in what order. Like here (IIS 6):
So when a user requests a directory on that site, IIS will search for "Default.htm", if that isn't found it'll look for "Default.asp" and so on. If none of the default documents are found, you will either see the directory's contents (disabled by default) or an error saying you can't see the directory's contents.
In Apache this is set through the DirectoryIndex directive in httpd.conf.
Yes. index is an arbitrary name that Apache defaults to. The index page can be named anything, and with IIS it is usually default.

Resources