I know you can specify a proxy via requests in praw using the environmental variable. I have had success doing so.
However, when building a custom session and specifying the the proxy address like so:
s = Session()
proxies = { 'https': 'https://72.35.40.34:8080'}
s.proxies.update(proxies)
# praw.ini holds praw_bot_name oauth details
bot = praw.Reddit(praw_bot_name, requestor_kwargs={'session': s})
print(bot._core._requestor._http.proxies)
The proxy will not take. The correct address shows up in this print statement, but via wireshark, I can see that the proxy is not actually in effect.
Does anyone know what might be going on here?
okay - so the above should actually work. I had a mistake elsewhere in my code which was causing my requests to run through my readonly reddit instance (which I was not defining a proxy for).
Will consider removing this question if others think it is not very useful.
Related
I'm trying to do something that seems easy, so I think I should ask for help, because it's looking complicated. Usually this means I'm asking the wrong question.
I want to tell nginx about a new location handler. It will just use grpc to ask a back end how to respond. (Probably nginx has something even niftier than grpc, but I start out thinking how I think for other components in my system.)
Looking through lists of nginx modules, I'm thinking I'll have to write a new module from scratch. Surely that is wrong.
Very specifically, I receive an http request that looks like this:
http://www.example.com/some-text-that-might-be-long
and I want the nginx handler to make a call to my service with that URL and the http request headers. It will get back a reply telling it how to respond. Typically that will be either a request to 302 to some other URL or another status code with some content.
Any pointers much appreciated.
My meteor server will fetch data from another source on Internet. The request has to go via a proxy. How can I specify the proxy server for server-side HTTP.call's?
You could easily make all HTTP.* calls through a proxy if only Meteor developers accepted my pull request to pass through options like proxy to the request module, on which the HTTP package is based.
Please comment on this GitHub issue to ask for that.
UPDATE: Since the Meteor devs refused to implement that change, I published an Atmosphere package that lets you transmit to Node (i.e. to the request module) any options you want.
Check out http-more on Atmosphere.
Found a solution for my problem.
I'm using Windows and could not find a way to set a default proxy for the OS as Serkan mentioned. Setting proxy server in Internet Explorer internet options LAN settings did not work. Settings proxy in winHTTP did not work. Anyone else know how to do it?
The most reasonable would be that Node read a environment variable and used that. So, I created an environment variable "HTTP_PROXY" and to see if node would read it I tried:
D:\Appl\.meteor\tools\a5dc07c9ab\bin>node -e "console.log(process.env.http_proxy)"
and it did output my variable. But, when trying to make a http.get() request directly within Node it failed. Node is obviously not using that variable ...
The conclusion of that is that I have to explicitly set the proxy in my app, but that is not possible with Meteor HTTP. Instead I could use the request module (that Meteor HTTP is using) and set the proxy. Not the ideal solution, because my app has to know about the proxy, but ok for my purpose.
if (Meteor.isServer) {
var request = Npm.require("request");
var makeRequest = Meteor._wrapAsync(thirdLibMakeRequest);
function thirdLibMakeRequest(options, callback) {
options.proxy = "http://myProxyServer:8080";
request(options, callback);
};
var response = makeRequest({ url: "http://UrlToSomeSite" });
}
Include the request module
Wrap the 3rd-lib async method so we can use it in Meteor
set the proxy property of the request module
use makeRequest to make requests.
Since the platform your meteor app will be running on will be behind the proxy as a whole, you'll be needing proxy access generally anyway.
Therefore, you can set your platform (os) up to connect to the proxy server by default, therefore Meteor will not necessarily know/care about the presence of a proxy since it will be transparent to it.
My web app makes request to third party servers, and we sometimes route them trough proxies. I'd like to be able to "see what they see" -- see what the request looks like once its been routed through the proxy.
Specifically, I'm interested in how much identifying information about the source (my web app) is left in the request once it reaches the destination, having been routed through the proxy.
Does anyone know an easy way to do this? Maybe a web service that will just echo back all the information about the incoming request in the outgoing response?
Not a full answer, but maybe you can try:
http://www.cantoni.org/2012/01/08/simple-webservice-echo-test
And the other 2 webs mentioned there:
http://respondto.it/
http://requestb.in/
To setup a URL to send your requests and see if the info provided helps you.
I'm just stating this as an idea that came to me. You could try sending requests to your own URL, which you control (i.e. a resource in your own web application). That way, you can use your debugging infrastructure or other facilities (basically anything you want) to inspect the request that's coming into your application. It seems to me this might be the most powerful / easiest way to do this. It won't let you test the URL you were trying to test, but in terms of proxy visibility, it might be what you need.
Good luck!
If the proxy supports the TRACE method and the Max-Forwards header you can use that. Not all do, however.
I want to change first line of the HTTP header of my request, modifying the method and/or URL.
The (excellent) Tamperdata firefox plugin allows a developer to modify the headers of a request, but not the URL itself. This latter part is what I want to be able to do.
So something like...
GET http://foo.com/?foo=foo HTTP/1.1
... could become ...
GET http://bar.com/?bar=bar HTTP/1.1
For context, I need to tamper with (make correct) an erroneous request from Flash, to see if an error can be corrected by fixing the url.
Any ideas? Sounds like something that may need to be done on a proxy level. In which case, suggestions?
Check out Charles Proxy (multiplatform) and/or Fiddler2 (Windows only) for more client-side solutions - both of these run as a proxy and can modify requests before they get sent out to the server.
If you have access to the webserver and it's running Apache, you can set up some rewrite rules that will modify the URL before it gets processed by the main HTTP engine.
For those coming to this page from a search engine, I would also recommend the Burp Proxy suite: http://www.portswigger.net/burp/proxy.html
Although more specifically targeted towards security testing, it's still an invaluable tool.
If you're trying to intercept the HTTP packets and modify them on the way out, then Tamperdata may be route you want to take.
However, if you want minute control over these things, you'd be much better off simulating the entire browser session using a utility such as curl
Curl: http://curl.haxx.se/
I am trying to determine if there is a way to check the availability of a potentially large list of urls (> 1000000) without having to send a GET request to every single one.
Is it safe to assume that if http://www.example.com is inaccessible (as in unable to connect to server or the DNS request for the domain fails), or I get a 4XX or 5XX response, then anything from that domain will also be inaccessible (e.g. http://www.example.com/some/path/to/a/resource/named/whatever.jpg)? Would a 302 response (say for whatever.jpg) be enough to invalidate the first assumption? I imagine sub domains should be considered distinct as http://subdomain.example.com and http://www.example.com may not direct to the same ip?
I seem to be able to think of a counter example for each shortcut I come up with. Should I just bite the bullet and send out GET requests to every URL?
Unfortunately, no you cannot infer anything from 4xx or 5xx or any other codes.
Those codes are for individual pages, not for the server. It's quite possible that one page is down and another is up, or one has a 500 server-side error and another doesn't.
What you can do is use HEAD instead of GET. That retrieves the MIME header for the page but not the page content. This saves time server-side (because it doesn't have to render the page) and for yourself (because you don't have to buffer and then discard content).
Also I suggest you use keep-alive to accelerate responses from the same server. Many HTTP client libraries will do this for you.
A failed DNS lookup for a host (e.g. www.example.com) should be enough to invalidate all URLs for that host. Subdomains or other hosts would have to be checked separately though.
A 4xx code might tell you that a particular page isn't available, but you couldn't make any assumptions about other pages from that.
A 5xx code really won't tell you anything. For example, it could be that the page is there, but the server is just too busy at the moment. If you try it again later it might work fine.
The only assumption you should make about the availability of an URL is that "Getting an URL can and will fail".
It's not safe to assume that a sub domain request will fail when a parent one does. Namely because inbetween your two requests your network connection can go up, down or generally misbehave. It's also possible for the domains to be changed in between requests.
Ignoring all internet connection issues. You are still dealing with a live web site that can and will change constantly. What is true now might not be true in 5 minutes when they decide to alter their page structure or change the way the display a particular page. Your best bet is to assume any get will fail.
This may seem like an extreme view point. But these events will happen. How you handle them will determine the robustness of your program.
First don't assume anything based on a single page failing. I have seen many cases where IIS will continue to serve static content but not be able to serve any dynamic content.
You have to treat each host name as unique you cannot assume subdomain.example.com and example.com point to the same IP. Or even if they do there is no guarentee that are the same site. IIS again has host headers that allows you to run multiple sites using a single IP Address.
If the connection to the server actually fails, then there's no reason to check URLs on that server. Otherwise, you can't assume anything.
In addition to what everyone else is saying, use HEAD requests instead of GET requests. They function the same, but the response doesn't contain the message body, so you save everyone some bandwidth.