Should the server treat such endpoints as different? [duplicate] - http

When should a trailing slash be used in a URL? For example - should my URL look like /about-us/ or like /about-us?
I am fully aware of the SEO-related issues - duplicate content and the canonical thing; I'm trying to figure out which one I should use in the context of serving pages correctly alone.
For example, my colleague is thinking that a trailing slash at the end means it's a "folder" - a "directory", so this is not a correct style. But I think that without a slash in the end - it's not quite correct either, because it almost looks like a folder, but it isn't and it's not a normal file either, but a filename without extension.
Is there a proper way of knowing which to use?

It is not a question of preference. /base and /base/ have different semantics. In many cases, the difference is unimportant. But it is important when there are relative URLs.
child relative to /base/ is /base/child.
child relative to /base is (perhaps surprisingly) /child.

In my personal opinion trailing slashes are misused.
Basically the URL format came from the same UNIX format of files and folders, later on, on DOS systems, and finally, adapted for the web.
A typical URL for this book on a Unix-like operating system would be a file path such as file:///home/username/RomeoAndJuliet.pdf, identifying the electronic book saved in a file on a local hard disk.
Source: Wikipedia: Uniform Resource Identifier
Another good source to read: Wikipedia: URI Scheme
According to RFC 1738, which defined URLs in 1994, when resources contain references to other resources, they can use relative links to define the location of the second resource as if to say, "in the same place as this one except with the following relative path". It went on to say that such relative URLs are dependent on the original URL containing a hierarchical structure against which the relative link is based, and that the ftp, http,
and file URL schemes are examples of some that can be considered hierarchical, with the components of the hierarchy being separated by "/".
Source: Wikipedia Uniform Resource Locator (URL)
Also:
That is the question we hear often. Onward to the answers! Historically, it’s common for URLs with a trailing slash to indicate a directory, and those without a trailing slash to
denote a file:
http://example.com/foo/ (with trailing slash, conventionally a directory)
http://example.com/foo (without trailing slash, conventionally a file)
Source: Google WebMaster Central Blog - To slash or not to slash
Finally:
A slash at the end of the URL makes the address look "pretty".
A URL without a slash at the end and without an extension looks somewhat "weird".
You will never name your CSS file (for example) http://www.sample.com/stylesheet/ would you?
BUT I'm being a proponent of web best practices regardless of the environment.
It can be wonky and unclear, just as you said about the URL with no ext.

I'm always surprised by the extensive use of trailing slashes on non-directory URLs (WordPress among others). This really shouldn't be an either-or debate because putting a slash after a resource is semantically wrong. The web was designed to deliver addressable resources, and those addresses - URLs - were designed to emulate a *nix-style file-system hierarchy. In that context:
Slashes always denote directories, never files.
Files may be named anything (with or without extensions), but cannot contain or end with slashes.
Using these guidelines, it's wrong to put a slash after a non-directory resource.

That's not really a question of aesthetics, but indeed a technical difference. The directory thinking of it is totally correct and pretty much explaining everything. Let's work it out:
You are back in the stone age now or only serve static pages
You have a fixed directory structure on your web server and only static files like images, html and so on — no server side scripts or whatsoever.
A browser requests /index.htm, it exists and is delivered to the client. Later you have lots of - let's say - DVD movies reviewed and a html page for each of them in the /dvd/ directory. Now someone requests /dvd/adams_apples.htm and it is delivered because it is there.
At some day, someone just requests /dvd/ - which is a directory and the server is trying to figure out what to deliver. Besides access restrictions and so on there are two possibilities: Show the user the directory content (I bet you already have seen this somewhere) or show a default file (in Apache it is: DirectoryIndex: sets the file that Apache will serve if a directory is requested.)
So far so good, this is the expected case. It already shows the difference in handling, so let's get into it:
At 5:34am you made a mistake uploading your files
(Which is by the way completely understandable.) So, you did something entirely wrong and instead of uploading /dvd/the_big_lebowski.htm you uploaded that file as dvd (with no extension) to /.
Someone bookmarked your /dvd/ directory listing (of course you didn't want to create and always update that nifty index.htm) and is visiting your web-site. Directory content is delivered - all fine.
Someone heard of your list and is typing /dvd. And now it is screwed. Instead of your DVD directory listing the server finds a file with that name and is delivering your Big Lebowski file.
So, you delete that file and tell the guy to reload the page. Your server looks for the /dvd file, but it is gone. Most servers will then notice that there is a directory with that name and tell the client that what it was looking for is indeed somewhere else. The response will most likely be be:
Status Code:301 Moved Permanently with Location: http://[...]/dvd/
So, totally ignoring what you think about directories or files, the server only can handle such stuff and - unless told differently - decides for you about the meaning of "slash or not".
Finally after receiving this response, the client loads /dvd/ and everything is fine.
Is it fine? No.
"Just fine" is not good enough for you
You have some dynamic page where everything is passed to /index.php and gets processed. Everything worked quite good until now, but that entire thing starts to feel slower and you investigate.
Soon, you'll notice that /dvd/list is doing exactly the same: Redirecting to /dvd/list/ which is then internally translated into index.php?controller=dvd&action=list. One additional request - but even worse! customer/login redirects to customer/login/ which in turn redirects to the HTTPS URL of customer/login/. You end up having tons of unnecessary HTTP redirects (= additional requests) that make the user experience slower.
Most likely you have a default directory index here, too: index.php?controller=dvd with no action simply internally loads index.php?controller=dvd&action=list.
Summary:
If it ends with / it can never be a file. No server guessing.
Slash or no slash are entirely different meanings. There is a technical/resource difference between "slash or no slash", and you should be aware of it and use it accordingly. Just because the server most likely loads /dvd/index.htm - or loads the correct script stuff - when you say /dvd: It does it, but not because you made the right request. Which would have been /dvd/.
Omitting the slash even if you indeed mean the slashed version gives you an additional HTTP request penalty. Which is always bad (think of mobile latency) and has more weight than a "pretty URL" - especially since crawlers are not as dumb as SEOs believe or want you to believe ;)

When you make your URL /about-us/ (with the trailing slash), it's easy to start with a single file index.html and then later expand it and add more files (e.g. our-CEO-john-doe.jpg) or even build a hierarchy under it (e.g. /about-us/company/, /about-us/products/, etc.) as needed, without changing the published URL. This gives you a great flexibility.

Other answers here seem to favor omitting the trailing slash. There is one case in which a trailing slash will help with search engine optimization (SEO). That is the case that your document has what appears to be a file extension that is not .html. This becomes an issue with sites that are rating websites. They might choose between these two urls:
http://mysite.example.com/rated.example.com
http://mysite.example.com/rated.example.com/
In such a case, I would choose the one with the trailing slash. That is because the .com extension is an extension for Windows executable command files. Search engines and virus checkers often dislike URLs that appear that they may contain malware distributed through such mechanisms. The trailing slash seems to mitigate any concerns, allowing the page to rank in search engines and get by virus checkers.
If your URLs have no . in the file portion, then I would recommend omitting the trailing slash for simplicity.

Who says a file name needs an extension?? take a look on a *nix machine sometime...
I agree with your friend, no trailing slash.

From an SEO perspective, choosing whether or not to include a trailing slash at the end of a URL is irrelevant. These days, it is common to see examples of both on the web. A site will not be penalized either way, nor will this choice affect your website's search engine ranking or other SEO considerations.
Just choose a URL naming convention you prefer, and include a canonical meta tag in the <head> section of each webpage.
Search engines may consider a single webpage as two separate duplicate URLS when they encounter it with and without the trailing slash, ie example.com/about-us/ and example.com/about-us.
It is best practice to include a canonical meta tag on each page because you cannot control how other sites link to your URLs.
The canonical tag looks like this: <link rel="canonical" href="https://example.com/about-us" />. Using a canonical meta tag ensures that search engines only count each of your URLs once, regardless of whether other websites include a trailing slash when they link to your site.

The trailing slash does not matter for your root domain or subdomain. Google sees the two as equivalent.
But trailing slashes do matter for everything else because Google sees the two versions (one with a trailing slash and one without) as being different URLs.
Conventionally, a trailing slash (/) at the end of a URL meant that the URL was a folder or directory.
A URL without a trailing slash at the end used to mean that the URL was a file.
Read more
Google recommendation

Related

ASP.NET web forms, relative paths broken by multilingual Urls

I recently was asked to transfer a multilingual site from working with languages based on cookies, to removing the cookie containing the language in the URL, for better SEO, while keeping the default language without the "prefix".
So instead of "mysite.com/page", I would have "mysite.com/{language}/page" for every language except for the default.
And so I updated the routing in the project to include the "{language}/URL" in an additional MapPageRoute for each URL.
The issue now is with relative paths for scripts and images etc.
When I am not in default language, the URL of these requests contains an additional "/en" or "/fr" or something, and of course there is no such folder.
Is there any workaround for this? I really don't want to have to replace close to thousand paths with absolute, or check for language before declaring them.

Load Drupal Site on Any URL

I'm setting up access to a Drupal 7 site. The site sits alone on a box that answers to a number of domains and that number is likely to grow. What I'd like to do is to tell Drupal to load the site regardless of which actual domain brought us to the box (the rest of the URL will always be the same, of course). Currently most of those domains send me to the install page.
The problem is the lack of a directory (symlink) in the sites/ directory.
I can probably rewrite requests coming through alternate domains in Nginx, but I'm wondering whether there's an application level answer. As it stands right now, accessing the box/site by any domain other than the canonical domain sends me to the install page.
Is there anything I can do?
It looks to me that you didn't configure your Drupal site as the "default" one.
The file "sites/default/settings.php" is loaded if no better (more specific to the current request) settings file can be found in the sites/folder... This is in fact a "wildcard" config, so the best solution would be to move the site files to the default folder. See the multi-site documentation for more details.
If you can't do that, then you can use sites.php for the rewriting, but you will need to update it to add any new URL you want to match. There's a little shortcut though: you can add a bunch of rewrites such as
$sites['com'] = 'default';
$sites['net'] = 'default';
$sites['org'] = 'default';
...
which will act as catch-all rewrites for sites ending in .com, .net, .org and so on, saving you a lot of (but not all) the manual rewrites.
Altering the conf_path() function should really be your last solution, since it will make updating Drupal a slower process (and if you forget to re-apply the changes after an update, your setup won't work any more).

Is it bad practice to rely on outbound rules for rewriting?

For the past few years, if I've wanted a URL of a page on a site rewritten I've put the rewritten URL into the link on the page.
E.g. If the page is /Product.aspx?filename=ProductA and it's rewritten to /Product/ProductA.aspx then I've put the following in my link:
...
However, with outbound rules I could just put the links in to the actual file paths, and rewrite with an outbound rule.
Is this a bad method? Would it cost the server unnessacery additional resources?
I would not consider this bad practice. Infact it affords you some additional flexibility as your mapping for friendly to real url's is all managed in one central location. If your seo team decide they want to change the url scheme, you dont have to pick through all the links on your site updating them- risking missing one!
One important limitation of the current version of the IIS rewrite module, is you cannot use outbound rewriting in conjunction with Static compression- However you can still use Dynamic compression. Static compression is nice because it will cache the compressed version of the page. See this article for instructions on getting url rewrite working with Dynamic compression: http://forums.iis.net/p/1165899/1950572.aspx

Plone, behaviour of URLs

The situation is the following: I created a site with Plone, developed, used, but behind a test URL. Now it has to be published, but the test URL is not appropriate and I don't want to move the site. I think, if I use a redirect, it won't be appear in the URL-bar, only in the case of site start page. Am I wrong? (The test URL should not be used, because it will be a "semi-official" site.) What do you suggest to do?
As far as I can see Plone uses absolute URLs everywhere. I can add relative URLs, but if I create a new page, a new event, etc., then they have absolute URLs on other automatically generated inner pages. Is there any way to convert these URLs to relative paths? Is there any setting possibilty where only a checkbox changes this default setting?
Plone does not store your URLs in the database. It uses the inbound host header (and any virtual hosting configuration set up with rewrite rules in Apache or Nginx) to calculate the correct absolute URL when rendering the page.
In other words - as soon as you actually point the relevant domain name to the server with your Plone instance, it'll just work.
P.S.
You should put a bit more effort into asking your question. This is just a copy and paste of a half-finished email chain where you tried to get the answer from me in private. It's not very easy to understand what you're asking.
I think what you are looking for is url rewriting to handle virtual hosting. ie to get your site to appear as if it's the root url of a domain.
This is normally done via the webserver that normally sits in front of plone. For apache, here is a howto
http://plone.org/documentation/kb/plone-apache/virtualhost
for other servers
http://plone.org/documentation/manual/plone-community-developer-documentation/hosting
You can also achieve this directly in zope (via ZMI) using something called the Virtual Host
Monster. see http://docs.zope.org/zope2/zope2book/VirtualHosting.html
PS. I don't think your question is badly worded. Plone does serve pages with a "base" tag and what appears to be absolute urls. They aren't baked into the database but it's also not obvious that the solution to getting the url you want is the VHM url syntax and a proxying frontend webserver. There is a reason why it doesn't use relative urls... which I can't remember it was so long ago.

Http caching - style.css?123 or style_123.css?

I'm currently playing around with a build / deployment script for minifying static resources. Following good practice I'd like to set an expire header far into the future for most of my javascript, stylesheet and images.
To my question, when one or more of the static files has changed clients should ask for the newest version of the file. Will adding something like /css/style.css ?1235 after the url be enough to trigger a new request? Or do I have to rename all my static files for each build (something like /css/style _12345.css)?
Update: Just to clarify, the reason that I'm asking is that I've noticed that a lot of other deployment scripts seems to take the "hard" path by renaming each file.
Will adding something like
/css/style.css ?1235 after the url be
enough to trigger a new request?
Yes, of course.
You can do it like this:
style.css?UNIXTIMESTAMP
where UNIXTIMESTAMP is the time when the file was created in unix timestamp format
RFC 2616 3.2 Uniform Resource Identifiers says:
As far as HTTP is concerned, Uniform Resource Identifiers are simply formatted strings which identify--via name, location, or any other characteristic--a resource.
So, http://foo/bar is a different resource to http://foo/bar?baz.
Some URIs are treated as equivalent, so http://foo/bar?baz is the same resource as http://foo/bar?BAZ (see 3.2.3 URI Comparison).
HTTP does contain some caching exceptions for handling the query part (see 13.9 Side Effects of GET and HEAD). I understand these to only apply to the exact same query.
if you can use mod_rewrite i add a rule like this:
RewriteRule ^/((.*)_[0-9]+(\.(?:js|css|swf).*))$ /$2$3 [NC,L]
which rewrites any javascript, css or flash file back to the original name so
/css/style_1234.css
gets rewritten back to
/css/style.css

Resources