Whats the best recommended way yo hide my staging website from search engines, i Googled it and found some says that i should put a metatag, and some said that i should put a text file inside my website directory, i want to know the standard way.
my current website is in asp.net, while i believe that it must be a common way for any website whatever its programming language.
Use a robots.txt file.
see here http://www.robotstxt.org/robotstxt.html
You could also use your servers robots.txt:
User-agent: *
Disallow: /
Google's crawler actually respects these settings.
Really easy answer; password protect it. If it’s a staging site then it quite likely is not intended to be publicly facing (private audience only most likely). Trying to keep it out of search engines is only treating a symptom when the real problem is that you haven’t appropriately secured it.
Keep in mind that you can't hide a public-facing unprotected web site from a search engine. You can ask that bots not index it (through the robots.txt that my fine colleagues have brought up), and the people who write the bots may choose not to index your site based on that, but there's got to be at least one guy out there who is indexing all the things people ask him not to index. At the very least one.
If this is a big requirement, keeping automated crawlers out, some kind of CAPCHA solution might work for you.
http://www.robotstxt.org/robotstxt.html
There are search engines / book marking services which do not use robots.txt. If you really don't want it to turn up ever I'd suggest using capcha's just to navigate to the site.
Whats the best recommended way yo hide my staging website from search engines
Simple: don't make it public. If that doesn't work, then only make it public long enough to validate that it is ready to post live and then take it down.
However, all that said, a more fundamental question is, "Why care?". If the staging site is really supposed to be the live site one step before pushing live, then it shouldn't matter if it is indexed.
Related
i have a website that was developed with wordpress
it was hacked ,I removed the malicious files that I've found on the server
and I got it back but when I search the website on google I found
strange links that I can't open them
photo
Remove url individually form google webmaster tools and it will take time remove it.
You should know that removing malicious files doesn't mean you cleaned up the site. There are many instances where a file will recreate all the malicious files again. Sometimes it can even be above the root of your website root folder. It's best to use a couple plugins to scan the whole site directory. And then check a couple days later if the malicious files return. (if this is the case you are best to actually just switch to a new server or reformat if you have the option as it will get quite expensive to pay someone to clean up your server)
First make sure you have completely cleaned up the hack. Then those pages in Google should get deindex as they won't exist anymore. It's probably not viable to remove every single hack page indexed in Google via webmaster tools as there could be 10s of thousands! (depending on the hack)
Below are a couple good scanners.
https://wordpress.org/plugins/quttera-web-malware-scanner
https://wordpress.org/plugins/sucuri-scanner
I would also recommend some plugins for enhanced security moving forward.
https://wordpress.org/plugins/ninjafirewall
https://wordpress.org/plugins/better-wp-security
https://wordpress.org/plugins/vulnerable-plugin-checker
We have just moved to drupal and are trying to pro-actively identify all broken external web (http://, https://) links.
I've seen some references to validation of links but wasn't sure if it only meant validation of the syntax of the link as opposed to whether these web links work or not (e.g. 404).
What is the easiest way to go through all web links in a drupal site and identify all of the broken external web links? This is something we'd like to automate and schedule every day/week.
As someone else mentioned, use Link Checker module. It's a great tool.
In addition, you can check the Crawl errors in Google Webmaster tools for 404'd links like this:
Clicking any URL from there will show you where the URL was linked from so you can update any internal broken links. Be sure to use canonical URLs to avoid that.
Make sure you're using a proper internal linking strategy to avoid broken internal links in the first place, too: http://www.daymuse.com/blogs/drupal-broken-internal-link-path-module-tutorial
Essentially: use canonical, relative links to avoid broken internal links in the future when you change aliases. In simple Drupal terms, be sure you're linking to "node/23" instead of "domain.ext/content/my-node-title" since multiple parts of that might change in the future.
I have not found a Drupal based approach for this. The best, free piece of software I've found for finding bad links on sites is Screaming Frog SEO Spider Tool.
http://www.screamingfrog.co.uk/seo-spider/
I recently stumbled upon Etherpad, it's a collaborative writing tool
http://code.google.com/p/etherpad/ - main project page
online Examples:
http://piratepad.net/
http://ietherpad.com/
http://typewith.me/
I want to add this engine somehow to my wordpress and let people collaborate their posts,
I'm wondering if it has been done before and/or does it take more than
shared hosting (that is what I have) to do it [server capabilities or what-not] ?
In general, I think this is a complicated way to go about it. Also, Etherpad allows some very basic font formatting but no images and such things you might want to include in a blog. Instead I suggest looking for some Wordpress plugin for collaborative writing, and you might find something less "real-timey" but perhaps good enough.
Or if you really want to try with Etherpad:
Etherpad needs lots of memory (RAM) to run. A typical configuration is 1 GB, but it might be possible to get by on 128MB dedicated to Etherpad. This means you'll need at least 256MB in total for a first attempt. Your shared host also needs to have a Java server installed (typically Jetty) and some proxying server (typically nginx). All in all, you have some work ahead of you in just getting Etherpad up and running. After that, integrating into the Wordpress blog editor. If/how this can be done, I don't know. I'd probably do a client-side javascript-hack to get the Wordpress textarea or richtext editarea to update from the Etherpad readonly view, which is the only place where you can get the contents of a pad as more-or-less raw source text.
A simpler solution would be to just add an Etherpad page through an iFrame. See this post for example - http://www.knowledgepolicy.com/2010/02/embed-etherpad-into-blogpost-or-on-any.html
In theory it's possible to replace Wordpress' editor with an Etherpad Lite iFrame. Etherpad now allows image/font editing and table support as plugins.
Java is no longer required for Etherpad, NodeJS however is.
Here is a plugin that is in development that does what you want - however development seemed to stop in early 2012.
http://participad.org/ seems to be the best solution in this space to date. I haven't tested it on my own site, but they have an at least partially-working demo online.
Yes! It is possible. WordPress now has a plugin. The plugin has three modules which enables an Editor in dashboard and let you edit via front-end.
You can find more details on their FAQ page.
i am trying to get the available languages installed in visitors pc's.
The problem is that i don't want to get the languages from the internet browser.
Any suggestions please?
The only (standard) way is to look in the HTTP header's 'Accept-Language'. See the standard. It would be a security hole if you could get access to more information than that without asking permission.
You could run some Active X component to spy on the users' computers, but you'd have to get them to give you permission first, but I suspect that will just cause people to not want to use your website. Also it would only work on Windows. I wouldn't recommend doing this.
Of course, you can always ask your users to tell you via some settings page. If changing this setting would help them to use your site, they would probably not mind doing that.
What should I do when I see some IP in my logs scrolling through 100s of pages on my site? I have a wordpress blog, and it seems like this isn't a real person. This happens almost daily with different IPs.
UPDATE: Oh, i forgot to mention, I'm pretty sure it's not a search engine spider. The hostname is not a searchengine, but some random person from india (ends in '.in').
What I am concerned with, is if it is a scraper, is there anything I can do? Or could it possibly be something worse than a scraper e.g. hacker?
It's a spider/crawler. Search engines use these to compile their listings, researchers use them to figure out the structure of the internet, the Internet Archive uses them to download the contents of the Internet for future generations, spammers use them to search for e-mail addresses, and many more such situations.
Checking out the user agent string in your logs may give you more information on what they're doing. Well-behaved bots will generally indicate who/what they are - Google's search bots, for example, are called Googlebot.
If you're concerned about script kiddies, I suggest checking your error logs. The scripts often look for things you may not have; e.g. on one system I run, I don't have ASP, however, I can tell when a script kiddie has probed the site because I see lots attempts to find ASP pages in my error logs.
Probably some script kiddie looking to take advantage of an exploit in your blog (or server). That, or some web crawler.
It's probably a spider-bot indexing your site. The "User-Agent" might give it away. It is possible to have 100s of GET requests easily for a dynamically generated Wordpress site if it isn't all blog pages but includes things like css, js and images.