I found several program over the internet which can grab your website and download the whole website on your pc. How one can secure your website from these programs?
Link: http://www.makeuseof.com/tag/save-and-backup-websites-with-httrack/
You have to tell whether the visitor is human or bot in the first place. This no easy task, see e. g. : Tell bots apart from human visitors for stats?
Then, if you detected what bot it is, you can decide wether you want to give it your website content or not. Legitimate bots (like Googlebot) will conveniently provide their own userAgent id; malicious bots / web crawlers may disguise themselves as common browser programs.
There is no 100% solution, anyway.
If you content is really sensitive, you may want to add captcha, or user authentication.
Related
I just used a great PDF Converter, but I noted that they have a 30 minute intermission between conversions (to get paying customers). So I got curious as to how the restriction might be is implemented; and afaik it doesn't seem to be (solely?) cookie-based.
IP-address doesn't seem likely (wouldn't that block entire NATted organizations collectively?), and using filename would be too blunt. Can Javascript generate hardware-unique info these days? What other other ways are there? What is secure, what is easy to implement and what is just rotten?
I think the problem here is to uniquely identify a client's browser.
Can Javascript generate hardware-unique info these days? What other
other ways are there?
A simple solution (may not be exhaustive) I can imagine, is to consider not just the cookie or the ip address but all possible parameters like
cookies
ip address
browser details
flash cookies and
then those information that can be pulled off from a client's browser via Javascript (which is enabled for most of the browsers and needed by most sites like the one you mentioned) such as plugins installed, their versions.
With all these information, one can identify a machine uniquely on the internet to a great extent.
What is secure, what is easy to implement and what is just rotten?
Personally, I have never implemented this, but it seems quite doable.
Some interesting links that I found during the course of this short interesting research are:
Peter Eckersley. 2010. How unique is your web browser?. In Proceedings of the 10th international conference on Privacy enhancing technologies (PETS'10), Mikhail J. Atallah and Nicholas J. Hopper (Eds.). Springer-Verlag, Berlin, Heidelberg, 1-18.
How unique and trackable is your browser?
Is browser fingerprinting a viable technique for identifying anonymous users?
How do I uniquely identify computers visiting my web site?
Browser fingerprinting code snippet
Flash Cookies, a Little-Known Privacy Threat
I am building application that needs to interact with users without accounts and keep track of them. I know OpenID is great and easy and I've used it in almost all my apps, but accounts are not option even those that user is likely to have like Facebook, Google, Yahoo account, etc.
Any coding language is acceptable (but asp.net, JavaScript or Flash would be best, or a combination).
So my plan is to use cookies...but cookies are so easily removed (I really don't count it as reliable identifier)
IP address...well this is efficient even trough proxies, but if someone uses dynamic IP like my whole country this also becomes unreliable
Flash cookies are fine, but I recently read an article describing Mozilla Firefox History-cleaning system gets rid of them too, I need confirmation for this.
Browser Fingerprinting - I don't know how reliable it is since anyone that knows little of any language that can send HTTP requests can spoof it (client string at least).
If anyone knows of any other methods from the ones I listed, or want to correct me in my list feel free to reply.
I build ASP.NET websites (hosted under IIS 6 usually, often with SQL Server backends and forms authentication).
Clients sometimes ask if I can check whether there are people currently browsing (and/or whether there are users currently logged in to) their website at a given moment, usually so the can safely do a deployment (they want a hotfix, for example).
I know the web is basically stateless so I can't be sure whether someone has closed the browser window, but I imagine there'd be some count of not-yet-timed-out sessions or something, and surely logged-in-users...
Is there a standard and/or easy way to check this?
Jakob's answer is correct but does rely on installing and configuring the Membership features.
A crude but simple way of tracking users online would be to store a counter in the Application object. This counter could be incremented/decremented upon their sessions starting and ending. There's an example of this on the MSDN website:
Session-State Events (MSDN Library)
Because the default Session Timeout is 20 minutes the accuracy of this method isn't guaranteed (but then that applies to any web application due to the stateless and disconnected nature of HTTP).
I know this is a pretty old question, but I figured I'd chime in. Why not use Google Analytics and view their real time dashboard? It will require minor code modifications (i.e. a single script import) and will do everything you're looking for...
You may be looking for the Membership.GetNumberOfUsersOnline method, although I'm not sure how reliable it is.
Sessions, suggested by other users, are a basic way of doing things, but are not too reliable. They can also work well in some circumstances, but not in others.
For example, if users are downloading large files or watching videos or listening to the podcasts, they may stay on the same page for hours (unless the requests to the binary data are tracked by ASP.NET too), but are still using your website.
Thus, my suggestion is to use the server logs to detect if the website is currently used by many people. It gives you the ability to:
See what sort of requests are done. It's quite easy to detect humans and crawlers, and with some experience, it's also possible to see if the human is currently doing something critical (such as writing a comment on a website, editing a document, or typing her credit card number and ordering something) or not (such as browsing).
See who is doing those requests. For example, if Google is crawling your website, it is a very bad idea to go offline, unless the search rating doesn't matter for you. On the other hand, if a bot is trying for two hours to crack your website by doing requests to different pages, you can go offline for sure.
Note: if a website has some critical areas (for example, writing this long answer, I would be angry if Stack Overflow goes offline in a few seconds just before I submit my answer), you can also send regular AJAX requests to the server while the user stays on the page. Of course, you must be careful when implementing such feature, and take in account that it will increase the bandwidth used, and will not work if the user has JavaScript disabled).
You can run command netstat and see how many active connection exist to your website ports.
Default port for http is *:80.
Default port for https is *:443.
I have got an ASP.Net 4 web site. I'm counting visitors at background but my code counts search engine's bots too. How can I understand my client is a bot or human? I don't want to count bots.
Regards
You can use the Crawler property of Request.Browser to filter search engine bots.
You could check the User Agent and then look for the Type R which is a robot or crawler.
See http://www.user-agents.org for more info.
I am sure there are cases where the bots are not following standards and you might have to one off those.
Your best bet is probably checking the client's user agent:
http://support.microsoft.com/kb/306576
There may even be a quick little library out there for .NET with a lot of well known user agents or good regexps to use. Note that some bots will send fake user agents to make it look like they're people, some people's browsers may send empty or unknown user agents, etc. But those cases should be few and far between. For the most part this should get you pretty good statistics.
You can try and inspect the User Agent in the message header, for starters. A malicious bot will fake that, though. A more labor intensive approach is to log/inspect your IP visits programmatically (look in the web log files, or collect them yourself) and try to deduce which of them are bots based on frequency of visits, etc. Quite a cat and mouse game.
if you want to block crawlers from accessing certain links, create a Robots.txt file in your root directory, with something like:
User-agent: *
Disallow: / // blocks the default route / page
Disallow: /MyPage.aspx
check
http://en.wikipedia.org/wiki/Robots_exclusion_standard
&
http://www.google.com/#hl=en&q=robots.txt
We have a situation where we log visits and visitors on page hits and bots are clogging up our database. We can't use captcha or other techniques like that because this is before we even ask for human input, basically we are logging page hits and we would like to only log page hits by humans.
Is there a list of known bot IP out there? Does checking known bot user-agents work?
There is no sure-fire way to catch all bots. A bot could act just like a real browser if someone wanted that.
Most serious bots identify themselves clearly in the agent string, so with a list of known bots you can fitler out most of them. To the list you can also add some agent strings that some HTTP libraries use by default, to catch bots from people who don't even know how to change the agent string. If you just log the agent strings of visitors, you should be able to pick out the ones to store in the list.
You can also make a "bad bot trap" by putting a hidden link on your page that leads to a page that's filtered out in your robots.txt file. Serious bots would not follow the link, and humans can't click on it, so only bot that doesn't follow the rules request the file.
Depending on the type of bot you want to detect:
Detecting Honest Web Crawlers
Detecting Stealth Web Crawlers
you can use Request.Browser.Crawler to detect crawlers programmatically;
preferably keep your list of recognized crawlers up to date as described here
http://www.primaryobjects.com/cms/article102.aspx
I think many bots would be identifiable by user-agent, but surely not all of them. A list of known IPs - I wouldn't count on it either.
A heuristic approach might work. Bots are usually much quicker at following links than people. Maybe you can track each client's IP and detect the average speed with which it following links. If it's a crawler it probably follows every link immediately (or at least much faster than humans).
Have you already added a robots.txt? While this won't solve for malicious bot use you might be surprised at the legitimate crawling activity already occurring on your site.
i don't think there will be a list of Botnet IP addresses, Botnet IP addresses is not static, and nobody knows who are the bots including the users that are behaving like Bots.
Your question is arguably hot research area right now, i'm curious if someone could give a solution for that problem.
You can use any kind of technique and understand if this is a human or not, then you can get the logs.
I think a best way to do this is to use a link for non human users ( bots, crowlers and etc... )
then gather their user-agent and then filter them by user-agent.
you have to make the link not observable to human to do this.
you can add robots.txt to the root of your site and then do this