How to detect proxy requests? [duplicate] - http

This question already has answers here:
How do you detect a VPN or Proxy connection? [closed]
(7 answers)
Closed 2 years ago.
I know it is popular question, and I read all topics about it. I want to put point for me in this question.
Goal: Detect proxy if user use it
Reason: If user use proxy does not show geo adv. I need to know bool result.
Decision:
1. Use database of proxy IPs (for ex: MaxMind);
2. Check header Connection: keep-alive because cheap proxy does not use persistent connection. But all modern browsers use it;
3. Check other popular headers;
4. Use JS to detect web-proxy by compare browser host and real host.
Questions:
1. Advise database, I read about MaxMind, but some people wrote it is not effective.
2. Check Connection-header. Is it okey?
3. May be I missed something?
PS/ Sorry for my english... I learn it.

Option 1 you suggested is the best option. Proxy detection can be time consuming and complicated.
As you mentioned maxmind and your concern for effectiveness, there are other APIs available like GetIPIntel. It's free and very simple to use. They go beyond simple blacklists and use machine learning and probability theory algorithms to determine a probability value and makes things very accurate.
Option 2 you mentioned doesn't hurt to implement unless you get a lot of false positives. Option 3-4 should not be used alone because it's very easy to get around it. All browser actions can be automated and just because someone is using a proxy, it does not mean they're not using a real browser.

The best way is definitely to use an API. You could use the database from MaxMind but then you need to keep downloading that database and making sure the data is kept up to date by them. And as you said there are questions about the accuracy of MaxMind data.
Personally I would recommend you try https://proxycheck.io which full disclosure is my own site, you get full access to everything for free, premium proxy detecting and blocking with 1,000 daily queries.

You can evaluate IP2Proxy database which is updated daily. It detects open proxy, web proxy, Tor and VPN. https://www.ip2location.com/database/px2-ip-proxytype-country
Check connection header is inaccurate for proxy types such as VPN.
Check headers is easily being defeated. A new generation of proxy will attempt to workaround older generation of detection methods.
Based on our experience, the best method in proxy detection is based on accurate blacklist.

Related

Can we monitor windows network information in realtime using minifilters?

I am trying to write a minifilter that more or less captures everything that happens in the kernel and was wondering if I could also capture "URLs"/network information; I stumbled upon windivert which seems to be using a .sys driver and also another thread which says we cannot get URLs in driver mode which leaves me a bit confused. If it is true then how does windivert do it?
I understand there is something called network redirect under minifilters on learn.microsoft.com which uses a dll and .sys file (same as windivert), but I could not find any resources that can help make me one.
Is there a better way to capture all visited URLs in real time?
Thanks in advance for any help or directions.
You're looking for Windows Filtering Platform and Filtering Platform Callout Drivers, which WinDivert is utilizing. This gives you the data that goes out over the wire, so for plain old HTTP over port 80 you can parse the requests to obtain the URL. This won't work for HTTPS since you're getting encrypted data over the wire; you'd have to implement some kind of MITM interception technique to handle that.

Deciding between TCP connection V/s web socket [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
We are developing a browser extension which would send all the URLs visited by a logged in user to backend APIs to be persisted.
Now as number of requests send to backend API would be huge and hence we are confused between if we create a persistent connection via websocket OR do it via TCP connection i.e. using HTTP rest API calls.
The data post to backend API doesn't need to be real time as we anyway would be using that data in our models which doesn't demand them to be real time.
We are inclined towards HTTP rest API calls as due to below reasons
Easy to implement
Easy to scale(using auto-scaling techniques)
Everyone in the team is already comfortable with the rest APIs
But at the same time cons would be
On the scale where we would have a lot of post requests going to server not sure it would be optimised
Feels like websockets can give us an optimised infrastructure :(
I would love if I can hear from community if we can have any pitfalls going with rest API calls option.
So first of all TCP is the transport layer. It is not possible to use raw TCP, you have to create some protocol on top of it. You have to give meaning to the stream of data.
REST or HTTP or even WebSockets will never be as efficient as customly designed protocol on top of raw TCP (or even UDP). However the gain may not be as spectacular as one may think. I've actually done such transition once and we've experienced only few percent of performance gain. And it was neither easy to do correctly nor easy to maintain. Of course YMMV.
Why is that? Well, the reason is that HTTP is already quite highly optimized. First of all you have "keep alive" header that keeps the connection open if it is used. And so the default HTTP mechanisms already persists connection when used. Secondly HTTP handles body compression by default, and with HTTP/2 it also handles headers compression. With HTTP/3 you even have more efficient TLS usage and better support in case of unstable network (e.g. mobile).
Another thing is that since you do not require real time data then you can do buffering. So you don't send data each time it is available, but you gather it for say few seconds, or minutes or maybe even hours, and send it all in one go. With such approach the difference between HTTP and custom protocol will be even less noticable.
All in all: I advice you start with the simplest solution there is, in your case it seems to be REST. Design your code so that transition to other protocol is as simple as possible. Optimize later if needed. Always measure.
Btw, there are lots of valid privacy and security concerns around your extension. For example I'm quite surprised that you didn't mention TLS at all. Which matters, not only because of security, but also because of performance: establishing TLS connections is not free (although once established, encryption does not affect performance much).
Putting my discomfort aside (privacy, anyone?)...
Assuming your extension collates the Information, you might consider "pushing" to the server every time the browser starts / quits and then once again every hour or so (users hardly ever quite their browsers these days)... this would make REST much more logical.
If you aren't collating the information on the client side, you might prefer a WebSocket implementation that pushes data in real time.
However, whatever you decide, you would also want to decouple the API from the transmission layer.
This means that (ignoring authentication paradigms) the WebSockets and REST implementations would look largely the same and be routed to the same function that contains the actual business logic... a function you could also call from a script or from the terminal. The network layer details should be irrelevant as far as the API implementation is concerned.
As a last note: I would never knowingly install an extension that collects so much data on me. Especially since URLs often contain private information (used for REST API routing). Please reconsider if you want to take part in creating such a product... they cannot violate our privacy if we don't build the tools that make it possible.

How would I go about making my own Application protocol similar to http/https?

I don't know where to start especially with what programming language and in what kind of environment. I know I would need 2 different types, a server which receives requests and sends the requested material back and a client which sends requests and views requested material but not sure where and how to start.
Thank you
What is the motivation for this? HTTP/HTTPS are very tried, true and secure protocols that every (almost every) web application communicates via.
I cannot fathom a possible reason to create your own, especially if it seems like you are not quite experienced enough to do so given the very generic question.
My answer would be, don't do this, use HTTP/HTTPs or WebSockets, whatever suits your applications requirements.

Advanced HTTP/2 proxy for load balancing of distributed scraping solution

I have built a distributed HTTP scraper solution that uses different "exit addresses" addresses by design in order to balance the network load.
The solution supports IPv4, IPv6 and HTTP proxy to route the traffic.
Each processor was responsible to define the most efficient route to balance the traffic and it was temporarily implemented manually for prototyping. Currently, the solution grows and with the number of processors as the complexity of the load balancing task get higher, that's why I need a way to create a component dedicated to it.
I did some rather extensive research, but seem to have failed in finding a solution for load balancing traffic between IPv6, IPv4 (thousands of local addresses) and public HTTP proxies. The solution needs to support weights, app-level response checks and cool-down periods.
Does anyone know a solution that already solves this problem? Before I start developing a custom one.
Thanks for your help!
If you search for load balancing proxy you'll discover the Cache Array Routing Protocol (CARP). This CARP might not be what you're searching for and there exists servers only for the proxy-cache what I never knew till now.
Nevertheless those servers have own load balancers too, and perhaps that's a detail where it's worth it to search more.
I found a presentation mentioning CARP as outstanding solution too: https://cs.nyu.edu/artg/internet/Spring2004/lectures/lec_8b.pdf
Example: for proxy-arrays in Netra Proxy Cache Server: https://docs.oracle.com/cd/E19957-01/805-3512-10/6j3bg665f/index.html
Also there exist several concepts for load-balancing (https://link.springer.com/article/10.1023/A:1020943021842):
The three proposed methods can broadly be divided into centralized and decentralized
approaches. The centralized history (CH) method makes use of the transfer rate of each
request to decide which proxy can provide the fastest turnaround time for the next job.
The route transfer pattern (RTP) method learns from the past history to build a virtual
map of traffic flow conditions of the major routes on the Internet at different times of the
day. The map information is then used to predict the best path for a request at a particular time of the day. The two methods require a central executive to collate information
and route requests to proxies. Experimental results show that self-organization can be
achieved (Tsui et al., 2001). The drawback of the centralized approach is that a bottleneck and a single point of failure is created by the central executive. The decentralized
approach—the decentralized history (DH) method—attempts to overcome this problem
by removing the central executive and put a decision maker in every proxy (Kaiser et al.,
2000b) regarding whether it should fetch a requested object or forward the request to another
proxy.
As you use public proxy-servers probably you won't use decentralized history (DH) but centralized history (CH) OR the route transfer pattern (RTP).
Perhaps it would be even useful to replace your own solution completely, i.e. by this: https://github.blog/2018-08-08-glb-director-open-source-load-balancer/. I've no reason for this special example, it's just random by search results I found.
As I'm not working with proxy-servers this post is just a collection of findings, but perhaps there is a usable detail for you. If not, don't mind - probably you know most or all already and it's never adding anything new for you. Also I never mention any concrete solution.
Have you checked this project? https://Traefik.io which supports http/2 and tcp load balancing. The project is open source and available on github. It is build using Go. I'm using it now as my reverse proxy with load balancing for almost everything.
I also wrote a small blog post on docker and Go where I showcase the usage of Traefik. That also might help you in your search. https://marcofranssen.nl/docker-tips-and-tricks-for-your-go-projects/
In the traefik code base you might find your answer, or you might decide to utilize traefik to achieve your goal instead of home grown solution.
See here for a nice explanation on the soon to be arriving Traefik 2.0 with TCP support.
https://blog.containo.us/back-to-traefik-2-0-2f9aa17be305

Web browser as web server

Sorry if this is a dumb question that's already been asked, but I don't even know what terms to best search for.
I have a situation where a cloud app would deliver a SPA (single page app) to a client web browser. Multiple clients would connect at once and would all work within the same network. An example would be an app a business uses to work together - all within the same physical space (all on the same network).
A concern is that the internet connection could be spotty. I know I can store the client changes locally and then push them all to the server once the connection is restored. The problem, however, is that some of the clients (display systems) will need to show up-to-date data from other clients (mobile input systems). If the internet goes down for a minute or two it would be unacceptable.
My current line of thinking is that the local network would need some kind of "ThinServer" that all the clients would connect to. This ThinServer would then work as a proxy for the main cloud server. If the internet breaks then the ThinServer would take over the job of syncing data. Since all the clients would be full SPAs the only thing moving around would be the data - so the ThinServer would really just need to sync DB info (it probably wouldn't need to host the full SPA - though, that wouldn't be a bad thing).
However, a full dedicated server is obviously a big hurdle for most companies to setup.
So the question is, is there any kind of tech that would allow a web page to act as a web server? Could a business be instructed to go to thinserver.coolapp.com in a browser on any one of their machines? This "webpage" would then say, "All clients in this network should connect to 192.168.1.74:2000" (which would be the IP:port of the machine running this page). All the clients would then connect to this new "server" and that server would act as a data coordinator if the internet ever went down.
In other words, I really don't like the idea of a complicated server setup. A simple URL to start the service would be all that is needed.
I suppose the only option might have to be a binary program that would need to be installed? It's not an ideal solution - but perhaps the only one? If so, are their any programs out there that are single click web servers? I've tried MAMP, LAMP, etc, but all of them are designed for the developer. Any others that are more streamlined?
Thanks for any ideas!
There are a couple of fundamental ways you can approach this. The first is to host a server in a browser as you suggest. Some example projects:
http://www.peer-server.com
https://addons.mozilla.org/en-US/firefox/addon/browser-server/
Another is to use WebRTC peer to peer communication to allow the browsers share information between each other (you could have them all share date or have one act as a 'master' etc deepening not he architecture you wanted). Its likely not going to be that different under the skin, but your application design may be better suited to a more 'peer to peer' model or a more 'client server' one depending on what you need. An example 'peer to peer' project:
https://developer.mozilla.org/en-US/docs/Web/Guide/API/WebRTC/Peer-to-peer_communications_with_WebRTC
I have not used any of the above personally but I would say, from using similar browser extension mechanisms in the past, that you need to check the browser requirements before you decide if they can do what you want. The top one above is Chrome based (I believe) and the second one is Firefox. The peer to peer one contains a list of compatible browser functions, but is effectively Firefox and Chrome based also (see the table in the link). If you are in an environment where you can dictate the browser type and plugins etc then this may be ok for you.
The concept is definitely very interesting (peer to peer web servers) and it is great if you have the time to explore it. However, if you have an immediate business requirement, it might be that a simple on site server based approach may actually be more reliable, support a wider variety of browser and actually be easier to maintain (as the skills required are quite commonly available).
BTW, I should have said - 'WebRTC' is probably a good search term for you, in answer to the first line of your question.
httprelay.io v.s. WebRTC
Pros:
Simple to use
Fast
Supported by all browsers and HTTP clients
Can be used with the not stable network
Opensource and cross-platform
Cons:
Need to run a server instance
No data streaming is supported (yet)

Resources