SWI-Prolog http_post and http_delete inexplicably hang - http

When I attempt to use SWI-Prolog's http_post/4, as follows:
:- use_module(library(http/http_client).
update(URL, Arg) :-
http_post(URL, form([update = Arg), _, [status_code(204)]).
When I query this rule, and watch the TCP traffic, I see the HTTP POST request and reply with the expected 204 status code both occur immediately. However, Prolog hangs for up to 30 seconds before returning back 'true'. What is happening that prevents the rule from immediately returning?
I've tried this variant as well, but it also hangs:
:- use_module(library(http/http_client).
update(URL, Arg) :-
http_post(URL, form([update = Arg), Reply, [status_code(204)]),
close(Reply).
I have a similar issue with http_delete/3, but not with http_get/3.

library docs state that http_post
It is equivalent to http_get/3, except for providing an input document, which is posted using http_post_data/3.
http_get has timeout(+Timeout) in its options. That could help to lower the latency, but as it is set to +infinite by default, I fear will not solve the issue. Seems like the service you are calling keeps the connection alive up to some timeout.
Personally I had to use http_open, instead of http_post, when calling Google API services on https...

Related

How to check if Telnet connection is still established? using telnetlib

I'd like to check if a connection using the telnetlib is still up.
The way I do it is to send a ls command and check the answer, but I'm sure there must be a smoother solution.
I've got the idea from here, so kudos to them, the code could be something like this
def check_alive(telnet_object):
try:
if telnet_object.sock:
telnet_object.sock.send(telnetlib.IAC + telnetlib.NOP)
telnet_object.sock.send(telnetlib.IAC + telnetlib.NOP)
telnet_object.sock.send(telnetlib.IAC + telnetlib.NOP)
return True
except:
pass
the idea is pretty simple:
if the close() was called .sock will be 0, so we do nothing
otherwise, we try to send something harmless, that should not interact with what ever the underlying service is, the IAC + NOP was a good candidate. LaterEdit: seems that doing the send only once is not enough, so I just did it 3 times, it's not very professional I know, but ... "if it looks stupid, but it works ... than it's not stupid"
if everything goes well we get to the "return True" thous we get our answer, otherwise, the exception will get ignored, and, as there's no return, we will get a None as a response
I've used this method for both direct and proxied(SocksiPy) connections against a couple of Cisco routers

Twain always returns TWRC_NOTDSEVENT

I use twain 2.3 (TWAINDSM.DLL) in my application with HP Scanjet 200 TWAIN Protocol 1.9.
My TWAIIN calls are:
OpenDSM: DG_CONTROL, DAT_PARENT, MSG_OPENDSM
OpenDS: DG_CONTROL, DAT_IDENTITY, MSG_OPENDS
EnableDS: DG_CONTROL, DAT_USERINTERFACE, MSG_ENABLEDS
ProcessDeviceEvent: DG_CONTROL, DAT_EVENT, MSG_PROCESSEVENT
and as a result of the last call I allways get TWRC_NOTDSEVENT instead of TWRC_DSEVENT.
Could please someone help with this?
Once you use DG_CONTROL / DAT_EVENT / MSG_PROCESSEVENT, all messages from the applications message loop must be sent to the data source for processing. Receiving TWRC_NOTDSEVENT means the forwarded message isn't for the source so the application should process it as normal.
Keep forwarding all messages to the source until you receive MSG_XFERREADY which means there is data to transfer. Once the transfer is finished and you have sent MSG_DISABLEDS that's when you can stop forwarding messages to the source.
Twain is a standard, and when many company implement that standard, not all of them do the same way. Along the way to support Twain, we will learn and adjust the code to support all the different implementations.
I experienced this situation before, and this is my workaround:
Instead of placing (rc == TWRC_DSEVENT) at the beginning of code (will skip the following MSG_XFERREADY processing afterward) you can move the comparison to the end after MSG_XFERREADY processing, so that MSG_XFERREADY is always checked and processed.
(rc == TWRC_DSEVENT) is only to determine if we should forward the window message or not.
I don't know your specific situation. I ran into a similar issue because I called OpenDSM with a HWND/wId which is from another process. You should call OpenDSM with the HWND of
the active window/dialog which is owned by current process.

IncompleteRead error when submitting neo4j batch from remote server; malformed HTTP response

I've set up neo4j on server A, and I have an app running on server B which is to connect to it.
If I clone the app on server A and run the unit tests, it works fine. But running them on server B, the setup runs for 30 seconds and fails with an IncompleteRead:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/nose-1.3.1-py2.7.egg/nose/suite.py", line 208, in run
self.setUp()
File "/usr/local/lib/python2.7/site-packages/nose-1.3.1-py2.7.egg/nose/suite.py", line 291, in setUp
self.setupContext(ancestor)
File "/usr/local/lib/python2.7/site-packages/nose-1.3.1-py2.7.egg/nose/suite.py", line 314, in setupContext
try_run(context, names)
File "/usr/local/lib/python2.7/site-packages/nose-1.3.1-py2.7.egg/nose/util.py", line 469, in try_run
return func()
File "/comps/comps/webapp/tests/__init__.py", line 19, in setup
create_graph.import_films(films)
File "/comps/comps/create_graph.py", line 49, in import_films
batch.submit()
File "/usr/local/lib/python2.7/site-packages/py2neo-1.6.3-py2.7-linux-x86_64.egg/py2neo/neo4j.py", line 2643, in submit
return [BatchResponse(rs).hydrated for rs in responses.json]
File "/usr/local/lib/python2.7/site-packages/py2neo-1.6.3-py2.7-linux-x86_64.egg/py2neo/packages/httpstream/http.py", line 563, in json
return json.loads(self.read().decode(self.encoding))
File "/usr/local/lib/python2.7/site-packages/py2neo-1.6.3-py2.7-linux-x86_64.egg/py2neo/packages/httpstream/http.py", line 634, in read
data = self._response.read()
File "/usr/local/lib/python2.7/httplib.py", line 532, in read
return self._read_chunked(amt)
File "/usr/local/lib/python2.7/httplib.py", line 575, in _read_chunked
raise IncompleteRead(''.join(value))
IncompleteRead: IncompleteRead(131072 bytes read)
-------------------- >> begin captured logging << --------------------
py2neo.neo4j.batch: INFO: Executing batch with 2 requests
py2neo.neo4j.batch: INFO: Executing batch with 1800 requests
--------------------- >> end captured logging << ---------------------
The exception happens when I submit a sufficiently large batch. If I reduce the size of the data set, it goes away. It seems to be related to request size rather than the number of requests (if I add properties to the nodes I'm creating, I can have fewer requests).
If I use batch.run() instead of .submit(), I don't get an error, but the tests fail; it seems that the batch is rejected silently. If I use .stream() and don't iterate over the results, the same thing happens as .run(); if I do iterate over them, I get the same error as .submit() (except that it's "0 bytes read").
Looking at httplib.py suggests that we'll get this error when an HTTP response has Transfer-Encoding: Chunked and doesn't contain a chunk size where one is expected. So I ran tcpdump over the tests, and indeed, that seems to be what's happening. The final chunk has length 0x8000, and its final bytes are
"http://10.210.\r\n
0\r\n
\r\n
(Linebreaks added after \n for clarity.) This looks like correct chunking, but the 0x8000th byte is the first "/", rather than the second ".". Eight bytes early. It also isn't a complete response, being invalid JSON.
Interestingly, within this chunk we get the following data:
"all_relatio\r\n
1280\r\n
nships":
That is, it looks like the start of a new chunk, but embedded within the old one. This new chunk would finish in the correct location (the second "." of above), if we noticed it starting. And if the chunk header wasn't there, the old chunk would finish in the correct location (eight bytes later).
I then extracted the POST request of the batch, and ran it using cat batch-request.txt | nc $SERVER_A 7474. The response to that was a valid chunked HTTP response, containing a complete valid JSON object.
I thought maybe netcat was sending the request faster than py2neo, so I introduced some slowdown
cat batch-request.txt | perl -ne 'BEGIN { $| = 1 } for (split //) { select(undef, undef, undef, 0.1) unless int(rand(50)); print }' | nc $SERVER_A 7474
But it continued to work, despite being much slower now.
I also tried doing tcpdump on server A, but requests to localhost don't go over tcp.
I still have a few avenues that I haven't explored: I haven't worked out how reliably the request fails or under precisely which conditions (I once saw it succeed with a batch that usually fails, but I haven't explored the boundaries). And I haven't tried making the request from python directly, without going through py2neo. But I don't particularly expect either of these to be be very informative. And I haven't looked closely at the TCP dump except for using wireshark's 'follow TCP stream' to extract the HTTP conversation; I don't really know what I'd be looking for there. There's a large section that wireshark highlights in black in the failed dump, and only isolated lines black in the successful dump, maybe that's relevant?
So for now: does anyone know what might be going on? Anything else I should try to diagnose the problem?
The TCP dumps are here: failed and successful.
EDIT: I'm starting to understand the failed TCP dump. The whole conversation takes ~30 seconds, and there's a ~28-second gap in which both servers are sending ZeroWindow TCP frames - these are the black lines I mentioned.
First, py2neo fills up neo4j's window; neo4j sends a frame saying "my window is full", and then another frame which fills up py2neo's window. Then we spend ~28 seconds with each of them just saying "yup, my window is still full". Eventually neo4j opens its window again, py2neo sends a bit more data, and then py2neo opens its window. Both of them send a bit more data, then py2neo finishes sending its request, and neo4j sends more data before also finishing.
So I'm thinking that maybe the problem is something like, both of them are refusing to process more data until they've sent some more, and neither can send some more until the other processes some. Eventually neo4j enters a "something's gone wrong" loop, which py2neo interprets as "go ahead and send more data".
It's interesting, but I'm not sure what it means, that the penultimate TCP frame sent from neo4j to py2neo starts \r\n1280\r\n - the beginning of the fake-chunk. The \r\n8000\r\n that starts the actual chunk, just appears part-way through an unremarkable TCP frame. (It was the third frame sent after py2neo finished sending its post request.)
EDIT 2: I checked to see precisely where python was hanging. Unsurprisingly, it was while sending the request - so BatchRequestList._execute() doesn't return until after neo4j gives up, which is why neither .run() or .stream() did any better than .submit().
It appears that a workaround is to set the header X-Stream: true;format=pretty. (By default it's just true; it used to be pretty, but that was removed due to this bug (which looks like it's actually a neo4j bug, and still seems to be open, but isn't currently an issue for me).
It looks like, by setting format=pretty, we cause neo4j to not send any data until it's processed the whole of the input. So it doesn't try to send data, doesn't block while sending, and doesn't refuse to read until it's sent something.
Removing the X-Stream header entirely, or setting it to false, seems to have the same effect as setting format=pretty (as in, making neo4j send a response which is chunked, pretty-printed, doesn't contain status codes, and doesn't get sent until the whole request has been processed), which is kinda weird.
You can set the header for an individual batch with
batch._batch._headers['X-Stream'] = 'true;format=pretty'
Or set the global headers with
neo4j._add_header('X-Stream', 'true;format=pretty')

Parallel HTTP web crawler in Erlang

I'm coding on a simple web crawler and have generated a bunch gf static files I try to crawl by the code at bottom. I have two issues/questions I don't have an idea for:
1.) Looping over the sequence 1..200 throws me an error exactly after 100 pages have been crawled:
** exception error: no match of right hand side value {error,socket_closed_remotely}
in function erlang_test_01:fetch_page/1 (erlang_test_01.erl, line 11)
in call from lists:foreach/2 (lists.erl, line 1262)
2.) How to parallelize the requests, e.g. 20 cincurrent reqs
-module(erlang_test_01).
-export([start/0]).
-define(BASE_URL, "http://46.4.117.69/").
to_url(Id) ->
?BASE_URL ++ io_lib:format("~p", [Id]).
fetch_page(Id) ->
Uri = to_url(Id),
{ok, {{_, Status, _}, _, Data}} = httpc:request(get, {Uri, []}, [], [{body_format,binary}]),
Status,
Data.
start() ->
inets:start(),
lists:foreach(fun(I) -> fetch_page(I) end, lists:seq(1, 200)).
1. Error message
socket_closed_remotely indicates that the server closed the connection, maybe because you made too many requests in a short timespan.
2. Parallellization
Create 20 worker processes and one process holding the URL queue. Let each process ask the queue for a URL (by sending it a message). This way you can control the number of workers.
An even more "Erlangy" way is to spawn one process for each URL! The upside to this is that your code will be very straightforward. The downside is that you cannot control your bandwidth usage or number of connections to the same remote server in a simple way.

Where can I find the default timeout settings for all browsers?

I'm looking for some kind of documentation that specifies how much time each browser (IE6/IE7/FF2/FF3, etc) will wait on a request before it just gives up and times out.
I haven't had any luck trying to get this.
Any pointers?
I managed to find network.http.connect.timeout for much older versions of Mozilla:
This preference was one of several
added to allow low-level tweaking of
the HTTP networking code. After a
portion of the same code was
significantly rewritten in 2001, the
preference ceased to have any effect
(as noted in all.js as early as
September 2001).
Currently, the timeout is determined
by the system-level connection
establishment timeout. Adding a way to
configure this value is considered
low-priority.
It would seem that network.http.connect.timeout hasn't done anything for some time.
I also saw references to network.http.request.timeout, so I did a Google search. The results include lots of links to people recommending that others include it in about:config in what appears to be a mistaken belief that it actually does something, since the same search turns up this about:config entries article:
Pref removed (unused).
Previously: HTTP-specific network
timeout. Default value is 120.
The same page includes additional information about network.http.connect.timeout:
Pref removed (unused).
Previously: determines how long to
wait for a response until registering
a timeout. Default value is 30.
Disclaimer: The information on the MozillaZine Knowledge Base may be incorrect, incomplete or out-of-date.
After the last Firefox update we had the same session timeout issue and the following setting helped to resolve it.
We can control it with network.http.response.timeout parameter.
Open Firefox and type in ‘about:config’ in the address bar and press Enter.
Click on the "I'll be careful, I promise!" button.
Type ‘timeout’ in the search box and network.http.response.timeout parameter will be displayed.
Double-click on the network.http.response.timeout parameter and enter the time value (it is in seconds) that you don't want your session not to timeout, in the box.
firstly I don't think there is just one solution to your problem....
As you know each browser is vastly differant.
But lets see if we can get any closer to the answer you need....
I think IE Might be easy...
Check this link
http://support.microsoft.com/kb/181050
For Firefox try this:
Open Firefox, and in the address bar, type "about:config" (without quotes). From there, scroll down to the Network.http.keep-alive and make sure that is set to "true". If it is not, double click it, and it will go from false to true. Now, go one below that to network.http.keep-alive.timeout -- and change that number by double clicking it. if you put in, say, 500 there, you should be good. let us know if this helps at all
For Google Chrome (Tested on ver. 62)
I was trying to keep a socket connection alive from the google chrome's fetch API to a remote express server and found the request headers have to match Node.JS's native <net.socket> connection settings.
I set the headers object on my client-side script with the following options:
/* ----- */
head = new headers();
head.append("Connnection", "keep-alive")
head.append("Keep-Alive", `timeout=${1*60*5}`) //in seconds, not milliseconds
/* apply more definitions to the header */
fetch(url, {
method: 'OPTIONS',
credentials: "include",
body: JSON.stringify(data),
cors: 'cors',
headers: head, //could be object literal too
cache: 'default'
})
.then(response=>{
....
}).catch(err=>{...});
And on my express server I setup my router as follows:
router.head('absolute or regex', (request, response, next)=>{
req.setTimeout(1000*60*5, ()=>{
console.info("socket timed out");
});
console.info("Proceeding down the middleware chain link...\n\n");
next();
});
/*Keep the socket alive by enabling it on the server, with an optional
delay on the last packet sent
*/
server.on('connection', (socket)=>socket.setKeepAlive(true, 10))
WARNING
Please use common sense and make sure the users you're keeping the socket connection open to is validated and serialized. It works for Firefox as well, but it's really vulnerable if you keep the TCP connection open for longer than 5 minutes.
I'm not sure how some of the lesser known browsers operate, but I'll append to this answer with the Microsoft browser details as well.

Resources