Haskell System.Timeout.timeout crashing when called from certain function - http

I'm scraping some data from the frontpages of a list of website domains. Some of them are not answering, or are very slow, causing the scraper to halt.
I wanted to solve this by using a timeout. The various HTTP libraries available don't seem to support that, but System.Timeout.timeout seems to do what I need.
Indeed, it seems to work fine when I test the scraping function, but it crashes as soon as I run the enclosing function: (Sorry for bad/ugly code. I'm learning.)
fetchPage domain =
-- Try to read the file from disk.
catch
(System.IO.Strict.readFile $ "page cache/" ++ domain)
(\e -> downloadAndCachePage domain)
downloadAndCachePage domain =
catch
(do
-- Failed, so try to download it.
-- This craches when called by fetchPage, but works fine when called from directly.
maybePage <- timeout 5000000 (simpleHTTP (getRequest ("http://www." ++ domain)) >>= getResponseBody)
let page = fromMaybe "" maybePage
-- This mostly works, but wont timeout if the domain is slow. (lswb.com.cn)
-- page <- (simpleHTTP (getRequest ("http://www." ++ domain)) >>= getResponseBody)
-- Cache it.
writeFile ("page cache/" ++ domain) page
return page)
(\e -> catch
(do
-- Failed, so just fuggeddaboudit.
writeFile ("page cache/" ++ domain) ""
return "")
(\e -> return "")) -- Failed BIG, so just don't give a crap.
downloadAndCachePage works fine with the timeout, when called from the repl, but fetchPage crashes. If I remove the timeout from downloadAndCachePage, fetchPage will work.
Anyone who can explain this, or know an alternative solution?

Your catch handler in fetchPage looks wrong -- it seems you're trying to read a file, and on file not found exception are directly calling into your http function from the exception handler. Don't do this. For complicated reasons, as I recall, code in exception handlers doesn't always behave like normal code -- particularly when it attempts to handle exceptions itself. And indeed, under the covers, timeout uses asynchronous exceptions to kill threads.
In general, you should put as little code as possible in exception handlers, and especially not put code that tries to handle further exceptions (although it is generally fine to reraise a handled exception to "pass it on" [as with bracket]).
That said, even if you're not doing the right thing, a crash (if it is a segfault type crash as opposed to a <<loop>> type crash), even from weird code, is nearly always wrong behavior from GHC, and if you're on GHC 7 then you should consider reporting this.

Related

How to get visibility into completion queue on C++ gRPC server

Note: Help with the immediate problem would be great, but mostly I'm looking for advice on troubleshooting gRPC timing issues in general (this isn't my first such issue).
I am adding a new server streaming service to a C++ module which has an existing server streaming service, and the two appear to be conflicting. Specifically, the completion queue Next() call on the server is crashing intermittently after the C# client calls Cancel() on the cancellation token for one of the services. This doesn't happen if I run each service independently.
On the client, I get this at the response stream MoveNext() call:
System.InvalidOperationException
HResult=0x80131509
Message=Shutdown has already been called
Source=Grpc.Core
StackTrace:
at Grpc.Core.Internal.CompletionQueueSafeHandle.BeginOp()
at Grpc.Core.Internal.CallSafeHandle.StartReceiveMessage(IReceivedMessageCallback callback)
at Grpc.Core.Internal.AsyncCallBase`2.ReadMessageInternalAsync()
at Grpc.Core.Internal.ClientResponseStream`2.<MoveNext>d__5.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
at MyModule.Connection.<DoSubscriptionReceives>d__7.MoveNext() in C:\snip\Connection.cs:line 67
On the server, I get this at the completion queue next() call:
Exception thrown: read access violation.
core_cq_tag->**** was 0xDDDDDDDD.
The stack trace:
MyModule.exe!grpc_impl::CompletionQueue::AsyncNextInternal(void * * tag, bool * ok, gpr_timespec deadline) Line 59 C++
> MyModule.exe!grpc_impl::CompletionQueue::Next(void * * tag, bool * ok) Line 176 C++
...snip...
It appears something is being added to the queue after shutdown. The difficulty is I have little visibility into what is being added into the queue and in what order.
I'm trying to write a server-side interceptor to log all requests & responses, but there seems to be no documentation. So far, poking through the API hasn't gotten me very far. Is there any documentation available on wiring up an interceptor in C++? Or, are there other approaches for troubleshooting timing conflicts between services?
Windows 11, Grpc.Core 1.27
What I've tried:
I first played with the GRPC_TRACE & GRPC_VERBOSITY environment variables. I was able to get some unhelpful output from the client, but nothing from the server. Of course, there's been lots of debugging, stripping the client & server down to barebones, disabling keep alives, ensuring we aren't using deadlines, having the services share a cancellation token, etc.
Update: I have found that the crash only happens when the client is run from an NUnit test. In that environment, the completion queue is getting more hits on Next(), but I'm still trying to figure out where they are coming from.
Is 1.27 the version you are using? That seems pretty old.. There might have been fixes since then.
For using the C++ server interception API, I think you would find this very useful - https://github.com/grpc/grpc/blob/0f2a0f5fc9b9e9b9c98d227d16575d106f1e8d43/test/cpp/end2end/server_interceptors_end2end_test.cc#L48
One suggestion I have is to run the code another sanitizers https://github.com/google/sanitizers to make sure that we don't have a heap-use-after-free type bug.
I would also check for API misuse issues. (If you had posted the code, I could've given a look to see if anything seems weird..)

use ngx.timer.at function in init_worker_by_lua_file seems does not work

I want to use ngx.timer.at module to start another thread besides the "main" thread for lua worker process. From the documentation, it turned out the ngx.timer.at module make this pretty easy to achieve. However, when I have this simple code, it seems I did not really run in the backend. I tried log to log file when it starts, but the log information did not appear.
local function _hello(premature)
ngx.log(ngx.ERR, "Hello world")
if premature then
return
end
end
ngx.timer.at(0, _hello)
"Hello world" did not appear in the log file and I have no idea the new thread was ever been created successfully.
Any ideas?
Turned out this is caused by another problem. Close it here:
https://groups.google.com/forum/#!topic/openresty-en/m-5a1Xrpruw

OCaml Unix Error

I have run into an error that I am not sure how to debug. The error is Exception: (Unix.Unix_error "Too many open files" pipe ""). I am not opening any files and only have a single Unix process open. Anybody have some tips on how to debug this?
The function causing the error is:
let rec update_act_odrs ?(sec_to_wait = 0.0) () =
try
(act_odrs := active_orders ())
|> fun _ -> Lwt_io.print "active_orders Updated\n"
with _ ->
Lwt_unix.sleep sec_to_wait
>>= update_act_odrs ~sec_to_wait:(sec_to_wait +. 1.0)
where active_orders () is a function that gets JSON data from a server.
I would suggest to use ltrace, to trace the calls to open or pipe function. Also, it is good idea, just to grep your codebase, for functions that usually opens descriptors, e.g. openfile, socket, pipe, popen.
Also, you should know, that the failing function is not always a root of the evil. The descriptors can be eaten by some other process or function in your process.
You can also look at /proc folder, if you're on linux, to make sure, that your process actually eats so many fds. Maybe, you have another process in your system, that is launched by your application, and that is responsible for the fd leak.
Finally, from the code you've shown, I can conclude, that the source of leak can be only active_orders function. If it downloads json data from serve, it should open a socket connection. The fact that error message, points to the pipe is strange, maybe it is implemented with popen or system functions.

Erlang stopping application doesn't end all processes?

When I stop an Erlang application that I built, the cowboy listener process stays alive, continuing to handle requests. In the gen_server that I wrote I start a server on init. As you can see below:
init([Port]) ->
Dispatch = cowboy_router:compile([
{'_', [
{"/custom/[...]", ?MODULE, []},
% Serve index.html as default file
% Serve entire directory
{"/[...]", cowboy_static, {priv_dir,
app, "www"}}
]}
]),
Name = custom_name,
{ok, Pid} = cowboy:start_http(Name, 100,
[{port, Port}],
[{env, [{dispatch, Dispatch}]}]),
{ok, #state{handler_pid = Pid}}.
This starts the cowboy http server, which uses cowboy_static to server some stuff in the priv/app/ dir and the current module to handle custom stuff (module implements all the cowboy http handle callbacks). It takes the pid returned from the call and assigns it to handler_pid in the state record. This all works. However when I startup the application containing this module (which works) and then I stop it. All processes end (at least the ones in my application). The custom handler (which is implemented in the same module as the gen_server) no longer works. But the cowboy_static handle continues to handle requests. It continues to serve static files until I kill the node. I tried fixing this by adding this to the gen_server:
terminate(_Reason, State) ->
exit(State#state.handler_pid, normal),
cowboy:stop_listener(listener_name()),
ok.
But nothing changes. The cowboy_static handler continues to serve static files.
Questions:
Am I doing anything wrong here?
Is cowboy_static running under the cowboy application? I assume it is.
If so, how do I stop it?
And also, should I be concerned about stopping it? Maybe this isn't that big a deal.
Thanks in advance!
I don't think it is really important, generally you use one node/VM per application (in fact a bunch of erlang application working together, but I haven't a better word). But I think you can stop the server using application:stop(cowboy), application:stop(ranch).
You should fix 3 things:
the symbol in start_http(Name, ...) and stop_listener(Name) should match.
trap exit in service init: process_flag(trap_exit, true)
remove exit call from terminate.

Unmasking not found errors and see their real exceptions in Plone

The following is from Zope's BaseRequest.py:
# traverseName() might raise ZTK's NotFound
except (KeyError, AttributeError, ztkNotFound):
if response.debug_mode:
return response.debugError(
"Cannot locate object at: %s" % URL)
else:
return response.notFoundError(URL)
It translate various exceptions to not found page. This is very bad for the site developers, who don't know what actually goes wrong on the site.
How one does disable this mechanism (there is clearly response.debug_mode), so that you would see real exceptions
When Plone runs in debug mode
In unit tests and functional tests
When Plone runs in production mode (e.g temporarily to see why some URL really fails)

Resources