How do I prevent site scraping by using IP Blocking? - web-scraping

We are facing multiple attacks on our website. We are basically classified goods website that helps customers to connect to each other.
Bots are stealing data from our website everyday by giving fake mobile numbers. We don't have login mechanism. User have to share their contact number to get other users contact details.
I read this q/a thread stackoverflow prevent scrapping
An intelligent bot can easily avoid moving to files listed in robots.txt, changing cookies, changing user agents.
Even by using captcha, manually they can steal data by putting any random numbers.
We are planning to flag leads as suspicious if its more than x leads per day. The problem here is that bot can use services like TOR to generate unlimited IPs. How can we solve this i.e. if they have unlimited IPs and unlimited numbers, what can be solution to this?
All suggestions welcome except for putting OTP, as we already have that in mind.

Related

How would you identify if a visitor to one of your sites is the same person who visited another site of yours before (different domain)?

My question is more of a conceptual one, but in my specific case I am using Google Analytics 4. If the question is unclear, here it is in scenario form: Some guy visits my site x.com after a google search. He closes the tab, does another google search, and arrives at my other site y.com. How do I know it's the same person? I don't think there's anything I can do with User ID's in this situation. How would I solve this?
This isn't without fault, but if you are implementing it via Google Tag Manager, you have more control over the data being sent and on top of that, if you are transporting the data via Google Tag Manager server side container.
You would use a single server (but possibly different containers) or use BigQuery and either use the templateDataStorage API call or the BigQuery API call.
Essentially, the first time you see a google cid or an IP address or combination of user agent and ip address you would store it in the server or in a BigQuery table as a key and create a random associated value next to it.
At each time, across all your sites, you would check to see if the IP address or CID or combination of user agent and ip exists in the server or in the BigQuery table, then output the random value as a custom dimension and if not, it will create one.
Actually you probably wouldn't.
Presumably you could try fingerprinting, but depending on your legislation that might not be quite legal, and it tends to work a lot better in a lab than in real life. Also browsers start to implement anti-fingerprinting measures such as trimming the user agent, and denying access to browser properties such as installed plugins.
I have heard of experimental approaches to recongnize users via usage patterns - e.g. how do they move their mouse etc. I am not aware of any actual product that uses this, and I am not convinced it is a useful (or even legal) approach.
But in general, when it comes to cross-domain detection for unrelated visits (moving from domain to domain works via link decorators, and even that is affected by browser protections) you have the combined power of browser vendor against you, who try to make this harder (either for genuine concerns about privacy, or to establish themselves as the single gatekeeper for user identity. E.g. Google has a huge user base that is almost constantly logged in to Google accounts or Android smartphones, which helps with identifying users all over the web).

How can I check if it is the same user in ASP.NET?

This question is not related to ASP.NET specifically, but more web applications in general.
I am building a web application wherein I am registering a user. As of now I am taking in very basic credentials like First Name, Last Name, etc of the user. In this website I am giving some information for free for any user who has just registered so that the user finds my website authentic and that it is not a fake website. After that, to get more information, the user has to pay.
The information my site provides will get obsolete after sometime. So, when a new user registers, he/she will get the new information that gets updated; but the old users have to pay to get the same new information.
My problem here is once the information gets obsolete the same person can re-register with a different set of credentials and get the new information. I want to avoid this from happening.
So my question here is this: what information should I request from the user, or extract from the user, to check that the same user is not re-registering? Or any other way to make this possible.
I am thinking of getting the IP address of the machine from which the person is registering and use it to check. But the user can use a different machine to re-register.
I am completely lost here and not getting the solution. I even checked on the Internet but could not find an answer.
Please let me know if you need any further information from my side.
You will not find a technical way to prevent users from registering multiple times. They can simply use another device, IP, another email account and different credentials.
What you can do is asking them to send you hard to fake "offline" information, like a credit card number or a photo of the ID. Some users may still be able to register multiple times this way, but probably not indefinitly. You will however lose many possible clients this way who are unwilling to provide such information for a test account, so this is likely not the solution you want.
My advice would be one of the following two:
Limit the information/service you give out to free users, so that even if they register again they will gain something when they pay.
Try to bind them to their account in a way where they would lose something if they threw it away. This may for example be providing user rewards for activity (real or virtual) or increasing their experience based on their history. Take SO for example: If you registered again, you would lose all your reputation. The users will think twice if this is worth the new content.
After reading all of the above, i think a good solution could be to let the user identify himself through facebook or linkedin. Few people will have a second account.
I think you cannot put any users like that because every thing can be duplicate
There are some ways for which the user must have payment mode or identity details like passport or it is windows application you can have finger scanner it will be definitely Unique..
You can do this (with limitations) with the use of cookies. Setting a cookie on the users device will allow you to determine who the visitor is and that they have already registered.
The limitations are that cookies can be deleted or blocked and are only valid for that specific user agent - the user could use a different device or a different browser on the same device. A lot of people don't really know about cookies though and how to delete them.
By tying this technique with a requirement to provide a valid email address you can make it a hassle for somebody to register more than once as they will have to create a new email account and then delete their cookies.
Whether this will stop enough people depends on your site and your requirements - if you're giving money away then this technique is not nearly good enough. If you just want to discourage the practice of multiple accounts it may be enough.
Your only way out is to have SOMETHING the existing user gets as a "gift?" or added value to maintain just one account. If you can identify items of value to your subscribers and offer to "give" it to them provided their account "attains" one or more status, then you'll get some control. Take stackoverflow.com for example, I don't need a second account.
Identifying by facebook or linkedin is a good option, but if you are giving such services. which are very beneficial for the users, so they dont mind on creating multiple accounts on even facebook or linked in.
So what i think is to set some reward type stuff with each user, and increase the services as they get increment in rewards.once they are good in rewards and are capable to use multiple services, this increases the probability that they will not create another account.

Is it possible to save computer id upon website sign up?

I am looking for a solution to stop multiple sign ups on an upcoming websites of mine, but I am looking for different alternatives besides IP saving and tracing that.
So I was thinking about computer ID saving on the server. Is that possible?
No, that's not possible simply because this information is never sent on the network. The only information you could reliably get from a user visiting your website is his IP address in addition to the standard HTTP headers which might or not contain information about the UserAgent he is using, the language he configured in his browser, ...

How do websites prevent multiple votes without required login

A friend of mine showed me a website recently where a person could vote for something. There was no login required, but when I tried to vote more than once (per day), the web site knew. What are possible ways for this to be done?
My first thought was IP address, but I don't think that would work. If I'm in a large office building, work, or public wifi (starbucks, airports, etc) wouldn't it be the case that only 1 person per shared IP address could vote?
What if I drove around the city voting with my phone. If the website were to simply log IPs, wouldn't I theoretically be able to vote once for every cell tower I was close to?
If cookies were used, wouldn't it be possible to disable cookies and vote infinitely?
What mechanism is used to create this type of behavior?
Almost certainly done with a cookie.
It probably tests first that cookies are enabled, and only then lets you vote.
Try voting twice using two different browswers.

Considerations for anonymous users

So, the Web application I'm working on allows input from anonymous users (and their participation in the flagging system).
As for the spamming issue, would it be enough to use the honeypot method or is an image CAPTCHA (e.g. reCAPTCHA) necessary in this case?
For the flagging system, if I want to let anonymous users to "flag" posts, it's not enough to allow a flag (per post) per cookie because they have control over the cookies (and could bypass this prevention). I should allow ONLY a flag per IP then, right? I know that this method would prevent users that share the same IP (yeah, corporate networks, etc.) to flag to the same post, but there is no other way around it, is there?
How can I ensure anonymous users' anonymity? By this I mean, how to prevent their posts to be "tracked" (if this is even possible). I know that every server has a log of every connection, so, is it possible to hide theirs?
Any help would be greatly appreciated!
Honeypots are useless if your site is popular, because then people will write custom bots for it. For the flagging, you can limit it to one per cookie, and rate-limit it by IP. That way, people on corporate networks, etc. will be a little inconvenienced but not completely out of luck.
It's completely up to you what you log and how long you keep them. By default, the request IP may be logged, but you don't have to log it. Most sites do, but the real difference is how long they keep it.

Resources