www.domain.tld vs. domain.tld - http

TL;DR Should I redirect from www.domain.tld to domain.tld or vice versa?
I am running a CMS that handles multiple domains. Until now the CMS is in charge of redirecting www.domain.tld to domain.tld or vice versa for each domain individually, but I've decided to let mod_rewrite handle that in the future for some reasons:
Performance: No need to fire up the CMS just to serve a http-redirect
Consistency: For "historical reasons", some domains do the redirect in one directions and some do it in the other direction. That's not a real problem, just gives me an itch.
Simplicity: Instead of having to worry about each domain individually, I'll have a solution for all existing and future(!) domains, even those that are not handled by the CMS.
I know how to implement that, but what I don't know is if there's a preferred "direction" of the redirect. I found very little information on that subject, but maybe I just searched for the wrong stuff. I remember having read somewhere (I believe at some page of Google's webmaster tools, but I can't find it right now), that it doesn't matter which one you choose as long as you stick to it.
Personally, I prefer domain.tld over www.domain.tld, that's how I type it in my browser, that's how I say it and that's how I write it, because I think that "www." is unnecessary garbage that looks bad, sounds bad, costs time (to read, to write or in a verbal conversation), space (in your address bar or when printed on paper) and bandwidth (I know, 4 bytes, but those 4 bytes accumulate).
But to be sure, I went to see how others do it and all the big "internet companies" (Google, Facebook, Yahoo, eBay, Amazon) as well as Apple and Microsoft redirect to www.domain.tld. Sites that cater to a more technical audience are split up: icann.org and w3.org redirect to the www version while for example jquery, github, stackoverflow redirect to the non-www version. In fact stack overflow's description of the 'no-www' tag says this:
The process of eliminating the usage of www to prefix URLs, for instance by redirecting users from http://www.example/ to http://example/. www is by many considered a dead and unnecessary practice.
So, are there any good reasons to prefer one over the other except the ones I already mentioned for getting rid of 'www.'? Or is it just a matter of personal taste and my findings are just a strange coincidence?
Side question
Not really a part of my current problem, but I noticed something intriguing and I'm curious: If there's https involved, most sites that I checked will handle it like this:
http://domain.tld -(301)-> https://domain.tld -(301)-> https://www.domain.tld
However Paypal uses a 301 only for the second redirect and 302 for the first and Apple (iCloud) does it with just one redirect:
http://icloud.com -(301)-> https://www.icloud.com
Can anyone think of a reason for doing this one way or the other?

OK, so apparently my research was sloppy and I posted this question on the wrong part of the vast stackexchange network.
If anyone should stumble upon this question, looking for some answers, I found all the answers and links to further materials on Webmasters:
Should I include “www” in my canonical URLs? What are the pros and cons?
and there's also a related thread on meta:
Why isn't stackoverflow using www in the URL?

Related

Redirecting old pages to homepage/index

I have taken over an old domain and put a new site on it, it use to be a membership site so I have thousands of old URLs that now go to a 404 page.
Should I redirect these to the homepage to keep the link juice, it is a relative subject, so the links are useful.
If so how would I do it? This is a wordpress site.
/user/*
/image//
What is the best way to deal with these 404's?
I would redirect them to your homepage. Google (and all other crawlers) will notice the change, so you can provide a better and continuous user experience for your users.
There is a brilliant plugin called Redirection (https://wordpress.org/plugins/redirection/) that will allow you to configure redirects in WP, including logs of 404's that visitors to your site experience to help you know where redirects might be necessary.
A word of caution: there are some major implications to putting redirects in place including effects on your site's SEO. I recommend you read up on the different kinds of redirects and what they do.
Reading on the topic: https://developer.mozilla.org/en-US/docs/Web/HTTP/Redirections

URL hijacked, sort of - how, why and should I be worried?

I'm seeing a weird url in my google analytics. It's http://www.lisaredstone.commessage57741318.cenokos.ru/ and it redirects to an ecommece site. My clients site is http://www.lisaredstone.com. What's the story with this - is it something I need to be worried about? How do the .ru's leverage it to their benefit? What actions should I take?
It could just be someone is duplicating your GA account number and using a rewrite redirect. the .ru is a russian extension. I did find this page http://www.cradlecloud.com/how-to-ban-and-block-cenoval-ru-referrer-spam/ that may help you. but if it isn't actively redirecting any of the links on the site (which I tried several) I don't think there is anything to worry about, but someone who knows more about site security may be able to tell you more.

how to completely Hide website from search engines?

Whats the best recommended way yo hide my staging website from search engines, i Googled it and found some says that i should put a metatag, and some said that i should put a text file inside my website directory, i want to know the standard way.
my current website is in asp.net, while i believe that it must be a common way for any website whatever its programming language.
Use a robots.txt file.
see here http://www.robotstxt.org/robotstxt.html
You could also use your servers robots.txt:
User-agent: *
Disallow: /
Google's crawler actually respects these settings.
Really easy answer; password protect it. If it’s a staging site then it quite likely is not intended to be publicly facing (private audience only most likely). Trying to keep it out of search engines is only treating a symptom when the real problem is that you haven’t appropriately secured it.
Keep in mind that you can't hide a public-facing unprotected web site from a search engine. You can ask that bots not index it (through the robots.txt that my fine colleagues have brought up), and the people who write the bots may choose not to index your site based on that, but there's got to be at least one guy out there who is indexing all the things people ask him not to index. At the very least one.
If this is a big requirement, keeping automated crawlers out, some kind of CAPCHA solution might work for you.
http://www.robotstxt.org/robotstxt.html
There are search engines / book marking services which do not use robots.txt. If you really don't want it to turn up ever I'd suggest using capcha's just to navigate to the site.
Whats the best recommended way yo hide my staging website from search engines
Simple: don't make it public. If that doesn't work, then only make it public long enough to validate that it is ready to post live and then take it down.
However, all that said, a more fundamental question is, "Why care?". If the staging site is really supposed to be the live site one step before pushing live, then it shouldn't matter if it is indexed.

SEO implications of a multi lingual site with detection of system culture

I have developed a multi-lingual site in ASP.NET, which detects the user's system culture, and displays content in the appropriate language.
This is all working nicely, but my client has since had an SEO audit. The SEO agency has expressed a concern that this is not good SEO practice, as there are not unique URLs for each language.
They have suggested that the site may be accused of cloaking, and that google may not index the site correctly for each different language.
Any ideas on whether these are valid concerns, and if there is an advantage to having unique URLs for each language version of the site?
Although you have done a beautiful job switching Language automatically, the SEO Agency is correct!
That google may not index the site correctly for each diffferent language.
This is true! Google doesn't send the accept-language header last time I checked. This means that Google will only index the default language.
They have suggested that the site may be accused of cloaking,
This differs from your Excact implementation, but it is possible your site will receive a penalty!
There IS advantage having unique URLs (for each language version) on the site!
First of all, for your users: they can link to the language they prefer. Secondary for the Search Engines as they can index your site correctly.
I advice most of the time to redirect the user only on the home page for a language switch using a 302 redirect to the correct URL (and so the correct language). (edit: You can review the post by Matt Cutts "SEO Advice: Discussing 302 redirects")
To verify my advice: install fiddler and surf to http://www.ibm.com. As shown below, i received a 302 redirect to the appropriate language, arriving at www.ibm.com/be/en.
Result Protocol Host URL Body Caching Content-Type
4 302 HTTP www.ibm.com / 209 text/html
5 200 HTTP www.ibm.com /be/en/ 5.073 no-cache text/html;charset=UTF-8
There are a few solutions you can solve this:
Start Rewriting Urls (adding e.g. a directory using the language)
If you don't want to go through the hassle of adding directories (or rewriting url's) adding a QueryString would be the easiest solution (although try limiting them to maximum 2 parameters)
Another option is using different sub-domains! www.website.com for the default language, es.website.com, fr.website.com
Just make sure you supply every time the same content for the same URL.
Good luck with it!
Hopefully we will see some people answer who know about the internals of Google (anyone?). But most suppositions about how Google and others' crawlers are... suppositions, and subject to change.
My guess is that you should use separate URLs for languages, even if they just have a ?language= difference (although better would be a truly different URL). I believe this because when you go to google.it it says, google.com in English and that link goes to... google.com. In other words, Google itself uses different URLs for different languages.
Also, another big site, Microsoft (they probably know about SEO) uses
http://www.microsoft.com/en/us/default.aspx
for US-English and
http://www.microsoft.com/it/it/default.aspx
for Italy-Italian so it's probably best practice to differentiate based on language (and country).
In any case, I am totally annoyed when I'm on an English language computer and I can't see a site in Italian or Spanish, and vice-versa. As a usability, not SEO strategy, the user should be able to override the language suggestion. This is how most big sites handle languages, too.

What does it mean when I see some IPs look at hundreds of pages on my website?

What should I do when I see some IP in my logs scrolling through 100s of pages on my site? I have a wordpress blog, and it seems like this isn't a real person. This happens almost daily with different IPs.
UPDATE: Oh, i forgot to mention, I'm pretty sure it's not a search engine spider. The hostname is not a searchengine, but some random person from india (ends in '.in').
What I am concerned with, is if it is a scraper, is there anything I can do? Or could it possibly be something worse than a scraper e.g. hacker?
It's a spider/crawler. Search engines use these to compile their listings, researchers use them to figure out the structure of the internet, the Internet Archive uses them to download the contents of the Internet for future generations, spammers use them to search for e-mail addresses, and many more such situations.
Checking out the user agent string in your logs may give you more information on what they're doing. Well-behaved bots will generally indicate who/what they are - Google's search bots, for example, are called Googlebot.
If you're concerned about script kiddies, I suggest checking your error logs. The scripts often look for things you may not have; e.g. on one system I run, I don't have ASP, however, I can tell when a script kiddie has probed the site because I see lots attempts to find ASP pages in my error logs.
Probably some script kiddie looking to take advantage of an exploit in your blog (or server). That, or some web crawler.
It's probably a spider-bot indexing your site. The "User-Agent" might give it away. It is possible to have 100s of GET requests easily for a dynamically generated Wordpress site if it isn't all blog pages but includes things like css, js and images.

Resources