Celluloid & performant HTTP requests - http

I try to switch an existing crawler from EventMachine to Celluloid. To get in touch with Celluloid I've generated a bunch of static files with 150 kB per file on a linux box which are served via Nginx.
The code at the bottom should do its work, but there is a issues with the code which I don't understand: the code should spawn maximum 50 threads because of the thread pool size of 50 but it spawns 180 of them. If I increase the pool size to 100, 330 threads are spawned. What's going wrong there?
A simple copy & paste of this code should work on every box, so any hints are welcome :)
#!/usr/bin/env jruby
require 'celluloid'
require 'open-uri'
URLS = *(1..1000)
##requests = 0
##responses = 0
##total_size = 0
class Crawler
include Celluloid
def fetch(id)
uri = URI("http://data.asconix.com/#{id}")
puts "Request ##{##requests += 1} -> #{uri}"
begin
req = open(uri).read
rescue Exception => e
puts e
end
end
end
URLS.each_slice(50).map do |idset|
pool = Crawler.pool(size: 50)
crawlers = idset.to_a.map do |id|
begin
pool.future(:fetch, id)
rescue Celluloid::DeadActorError, Celluloid::MailboxError
end
end
crawlers.compact.each do |resp|
$stdout.print "Response ##{##responses += 1} -> "
if resp.value.size == 150000
$stdout.print "OK\n"
##total_size += resp.value.size
else
$stdout.print "ERROR\n"
end
end
pool.terminate
puts "Actors left: #{Celluloid::Actor.all.to_set.length} -- Alive: #{Celluloid::Actor.all.to_set.select(&:alive?).length}"
end
$stdout.print "Requests total: #{##requests}\n"
$stdout.print "Responses total: #{##responses}\n"
$stdout.print "Size total: #{##total_size} bytes\n"
By the way, the same issue occurs when I define the pool outside the each_slice loop:
....
#pool = Crawler.pool(size: 50)
URLS.each_slice(50).map do |idset|
crawlers = idset.to_a.map do |id|
begin
#pool.future(:fetch, id)
rescue Celluloid::DeadActorError, Celluloid::MailboxError
end
end
crawlers.compact.each do |resp|
$stdout.print "Response ##{##responses += 1} -> "
if resp.value.size == 150000
$stdout.print "OK\n"
##total_size += resp.value.size
else
$stdout.print "ERROR\n"
end
end
puts "Actors left: #{Celluloid::Actor.all.to_set.length} -- Alive: #{Celluloid::Actor.all.to_set.select(&:alive?).length}"
end

What ruby are you using? jRuby, Rubinius, etc? And what version of those?
Reason I ask is, threads are handled differently for each ruby. What you seem to be describing is added threads that come in for supervisors, and for tasks. Looking at the date of your post, it is likely that fibers are actually becoming native threads, which would make it seem like using jRuby perhaps. Also, using Futures often invokes the internal thread pool, which has nothing to do with your pool.
With both those reasons and others like them you could look for, that makes sense why you'd have a higher thread count than your pool calls for. This is a bit old though, so maybe you could follow up as to whether you still have this problem, and post output.

Related

How to check if can write to a folder

In julia, how do I check if the current is allowed to write to a folder?
I could do the python way, and just attempt to do it, and then fail fail and recover.
(In my case I can definitely recover, I have a list of locations to attempt to write to, as fallbacks. I expect the first few not to work (The first few are shared locations, so only computer admins are likely to have permission to writer there)
Python has also os.access function. Maybe Julia will have something similar in the future. Now we could borrow idea. :)
It is implemented in posixmodule.c (also functionality for windows!) so if you are on posix you could simply mimic:
julia> const R_OK = 4 # readability
julia> const W_OK = 2 # writability
julia> const X_OK = 1 # executability
julia> const F_OK = 4 # existence
julia> access(path, mode) = ccall(:access, Cint, (Cstring, Cint), path, mode) == 0;
Small test:
julia> access("/root", W_OK)
false
julia> access("/tmp", W_OK)
true
(for windows it could be just a little more complicated... But I could not test it now)
EDIT:
Thanks to Matt B. we could use libuv support in Julia which has to be portable (although slower on posix systems):
julia> function uv_access(path, mode)
local ret
req = Libc.malloc(Base._sizeof_uv_fs)
try
ret = ccall(:uv_fs_access, Int32, (Ptr{Void}, Ptr{Void}, Cstring, Int64, Ptr{Void}), Base.eventloop(), req, path, mode, C_NULL)
ccall(:uv_fs_req_cleanup, Void, (Ptr{Void},), req)
finally
Libc.free(req)
end
return ret, ret==0 ? "OK" : Base.struverror(ret)
end
julia> uv_access("/tmp", W_OK)
(0, "OK")
julia> uv_access("/root", W_OK)
(-13, "permission denied")
julia> uv_access("/nonexist", W_OK)
(-2, "no such file or directory")
Is the following sufficient:
julia> testdir(dirpath) = try (p,i) = mktemp(dirpath) ; rm(p) ; true catch false end
testdir (generic function with 1 method)
julia> testdir("/tmp")
true
julia> testdir("/root")
false
Returns true if dirpath is writable (by creating a temporary file inside a try-catch block). To find the first writable directory in a list, the following can be used:
julia> findfirst(testdir, ["/root","/tmp"])
2
Doing apropos("permissions"):
julia> apropos("permissions")
Base.Filesystem.gperm
Base.Filesystem.mkpath
Base.Filesystem.operm
Base.Filesystem.uperm
Base.Filesystem.mkdir
Base.Filesystem.chmod
shows a function called Base.Filesystem.uperm which seems to do exactly what you want it to:
help?> uperm
search: uperm supertype uppercase UpperTriangular isupper unescape_string unsafe_pointer_to_objref
uperm(file)
Gets the permissions of the owner of the file as a bitfield of
Value Description
––––– ––––––––––––––––––
01 Execute Permission
02 Write Permission
04 Read Permission
For allowed arguments, see stat.
Unfortunately it seems to be a bit buggy on my (old v7 nightly) build:
julia> uperm("/root")
0x07 # Uhhh I hope not?
I will update my build and raise a bug if one is not already present.
PS. In case it wasn't clear, I would expect to use this in combination with isdir to detect directory permissions specifically
I don't think that Dan Getz's answer will work on Windows because the temporary file created cannot be deleted while there is an open handle to it, but this amended version with a call to close does work:
function isfolderwritable(folder)
try
(p,i) = mktemp(folder)
close(i)
rm(p)
return(true)
catch
return(false)
end
end

Ownership process error running phoenix test in elixir 1.3.2

I am working through a phoenix tutorial but I have this error:
** (DBConnection.OwnershipError) cannot find ownership process for #PID<0.265.0>.
I am not using Task.start so nothing should be running asynchronously, and I thought that having the mode in the unless tag would be enough to prevent this error, in test/support/channel_case.ex:
setup tags do
:ok = Ecto.Adapters.SQL.Sandbox.checkout(Watchlist.Repo)
unless tags[:async] do
Ecto.Adapters.SQL.Sandbox.mode(Watchlist.Repo, {:shared, self()})
end
:ok
end
So, I am curious how I resolve this error.
This is how I run it:
mix test test/integration/listing_movies_test.exs
I am using elixir 1.3.2
UPDATE
defmodule ListingMoviesIntegrationTest do
use ExUnit.Case, async: true
use Plug.Test
alias Watchlist.Router
#opts Router.init([])
test 'listing movies' do
movie = %Movie{name: "Back to the future", rating: 5}
|> Repo.insert! <== error happens here
conn = conn(:get, "/movies")
response = Router.call(conn, #opts)
assert response.status == 200
assert response.resp_body == movie
end
Full stack trace:
(db_connection) lib/db_connection.ex:718: DBConnection.checkout/2
(db_connection) lib/db_connection.ex:619: DBConnection.run/3
(db_connection) lib/db_connection.ex:463: DBConnection.prepare_execute/4
(ecto) lib/ecto/adapters/postgres/connection.ex:91: Ecto.Adapters.Postgres.Connection.execute/4
(ecto) lib/ecto/adapters/sql.ex:235: Ecto.Adapters.SQL.sql_call/6
(ecto) lib/ecto/adapters/sql.ex:454: Ecto.Adapters.SQL.struct/6
(ecto) lib/ecto/repo/schema.ex:397: Ecto.Repo.Schema.apply/4
(ecto) lib/ecto/repo/schema.ex:193: anonymous fn/11 in Ecto.Repo.Schema.do_insert/4
(ecto) lib/ecto/repo/schema.ex:124: Ecto.Repo.Schema.insert!/4
test/integration/listing_movies_test.exs:13: (test)
and in test_helper which is actually being called as I put in a debug statement:
ExUnit.start
Ecto.Adapters.SQL.Sandbox.mode(Watchlist.Repo, :manual)
Ecto.Adapters.SQL.Sandbox.mode(Watchlist.Repo, {:shared, self()})
use ExUnit.Case, async: true
use Plug.Test
You have code for setup hook in "test/support/channel_case.ex" but you are not using it anywhere in your test or at least it is not clear if you do use it. It would be helpful if you can add this:
IO.puts "#{inspect __MODULE__}: setup is getting called."
somewhere in your setup hook's code. This will make sure that the code actually runs. My suspicion from the comment you made in my previous answer this code is dead.
defmodule ListingMoviesIntegrationTest do
use ExUnit.Case, async: true
use Plug.Test
alias Watchlist.Router
setup do
:ok = Ecto.Adapters.SQL.Sandbox.checkout(Watchlist.Repo)
Ecto.Adapters.SQL.Sandbox.mode(Watchlist.Repo, {:shared, self()})
:ok
end
#opts Router.init([])
test 'listing movies' do
movie = %Movie{name: "Back to the future", rating: 5}
|> Repo.insert! <== error happens here
conn = conn(:get, "/movies")
response = Router.call(conn, #opts)
assert response.status == 200
assert response.resp_body == movie
end
...
I got the same error and in my case on Elixir 1.8.2, Phoenix 1.4.1: after looking at this Elixir forum thread, I changed my test_helpers.exs to have the line below to have the adapter pool mode from manual to auto.
Ecto.Adapters.SQL.Sandbox.mode(MyApp.Repo, :auto)
You can read more about this about the modes and pool checkout and ownership on the Hex Ecto docs
I guess you are skipping this line
Ecto.Adapters.SQL.Sandbox.mode(Watchlist.Repo, {:shared, self()})
because
iex(1)> unless true do
...(1)> IO.puts "test"
...(1)> end
nil
iex(2)> unless false do
...(2)> IO.puts "test"
...(2)> end
test
:ok
Did you try:
if tags[:async] do
Ecto.Adapters.SQL.Sandbox.mode(Watchlist.Repo, {:shared, self()})
end

Killing a Haskell binary

If I press Ctrl+C, this throws an exception (always in thread 0?). You can catch this if you want - or, more likely, run some cleanup and then rethrow it. But the usual result is to bring the program to a halt, one way or another.
Now suppose I use the Unix kill command. As I understand it, kill basically sends a (configurable) Unix signal to the specified process.
How does the Haskell RTS respond to this? Is it documented somewhere? I would imagine that sending SIGTERM would have the same effect as pressing Ctrl+C, but I don't know that for a fact...
(And, of course, you can use kill to send signals that have nothing to do with killing at all. Again, I would imagine that the RTS would ignore, say, SIGHUP or SIGPWR, but I don't know for sure.)
Googling "haskell catch sigterm" led me to System.Posix.Signals of the unix package, which has a rather nice looking system for catching and handling these signals. Just scroll down to the "Handling Signals" section.
EDIT: A trivial example:
import System.Posix.Signals
import Control.Concurrent (threadDelay)
import Control.Concurrent.MVar
termHandler :: MVar () -> Handler
termHandler v = CatchOnce $ do
putStrLn "Caught SIGTERM"
putMVar v ()
loop :: MVar () -> IO ()
loop v = do
putStrLn "Still running"
threadDelay 1000000
val <- tryTakeMVar v
case val of
Just _ -> putStrLn "Quitting" >> return ()
Nothing -> loop v
main = do
v <- newEmptyMVar
installHandler sigTERM (termHandler v) Nothing
loop v
Notice that I had to use an MVar to inform loop that it was time to quit. I tried using exitSuccess from System.Exit, but since termHandler executes in a thread that isn't the main one, it can't cause the program to exit. There might be an easier way to do it, but I've never used this module before so I don't know of one. I tested this on Ubuntu 12.10.
Searching for "signal" in the ghc source code on github revealed the installDefaultSignals function:
void
initDefaultHandlers(void)
{
struct sigaction action,oact;
// install the SIGINT handler
action.sa_handler = shutdown_handler;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGINT, &action, &oact) != 0) {
sysErrorBelch("warning: failed to install SIGINT handler");
}
#if defined(HAVE_SIGINTERRUPT)
siginterrupt(SIGINT, 1); // isn't this the default? --SDM
#endif
// install the SIGFPE handler
// In addition to handling SIGINT, also handle SIGFPE by ignoring it.
// Apparently IEEE requires floating-point exceptions to be ignored by
// default, but alpha-dec-osf3 doesn't seem to do so.
// Commented out by SDM 2/7/2002: this causes an infinite loop on
// some architectures when an integer division by zero occurs: we
// don't recover from the floating point exception, and the
// program just generates another one immediately.
#if 0
action.sa_handler = SIG_IGN;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGFPE, &action, &oact) != 0) {
sysErrorBelch("warning: failed to install SIGFPE handler");
}
#endif
#ifdef alpha_HOST_ARCH
ieee_set_fp_control(0);
#endif
// ignore SIGPIPE; see #1619
// actually, we use an empty signal handler rather than SIG_IGN,
// so that SIGPIPE gets reset to its default behaviour on exec.
action.sa_handler = empty_handler;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGPIPE, &action, &oact) != 0) {
sysErrorBelch("warning: failed to install SIGPIPE handler");
}
set_sigtstp_action(rtsTrue);
}
From that, you can see that GHC installs at least SIGINT and SIGPIPE handlers. I don't know if there are any other signal handlers hidden in the source code.

Kicking clients from server (Erlang)

I'm new to Erlang and I am writing a basic server. I am trying to figure out how to correctly kick a client from the server using the information that I have about the client (which is Pid, Client_socket, and Client_name.
Any suggestions would be great and much appreciated. Thanks for reading :)
Here's my code so far:
-module(cell_clnt).
-export([cell_client/0]).
cell_client()->
%%% Add any needed parameters for your cell process here
Port = 21,
Pending_connections = 5,
Cell = fun()-> cell_process() end,
spawn(fun()-> timer:sleep(10), keyboard_loop(Cell) end),
receive
stop->
ok
end.
keyboard_loop(Cell)->
case io:get_line(">> ") of
"quit\n"->
io:fwrite("Exiting...~n"),
if is_pid(Cell)-> Cell!stop; true->ok end;
"start\n" when is_function(Cell)->
keyboard_loop(spawn(Cell));
Input when is_pid(Cell)->
Cell!{input,Input},
keyboard_loop(Cell);
_Input->
io:fwrite("No cell process active yet!~n"),
keyboard_loop(Cell)
end.
%%% Edit this to implement your cell process %%%
cell_process()->
io:fwrite("In cell~n"),
{ok,Listening_socket} = gen_tcp:listen(21,
[binary,
{backlog,5},
{active,false},
{packet,line}]),
loop(Listening_socket,[]).
loop(Listening_socket, Clients)->
io:format("Clients: ~p", [Clients]),
case gen_tcp:accept(Listening_socket) of
{ok,Client_socket} ->
gen_tcp:send(Client_socket, "Hello, what is your name?"),
{_,Name} = gen_tcp:recv(Client_socket,0),
gen_tcp:send(Client_socket, "Hello, "),
gen_tcp:send(Client_socket, Name),
Pid = spawn(fun()-> client_loop(Client_socket) end),
loop(Listening_socket,[{Pid,Client_socket,Name}|Clients])
end.
client_loop(Client_socket)->
case gen_tcp:recv(Client_socket,0) of
{ok,Message}-> gen_tcp:send(Client_socket,Message),
client_loop(Client_socket);
{error,Why}-> io:fwrite("Error: ~s~n",[Why]),
gen_tcp:close(Client_socket)
end.
Use when you need close a TCP socket and kill process:
gen_tcp:close(Socket)
exit(Pid, kill).
You can close a socket by killing the pid, like this:
erlang:exit(Pid, kill)

ErrorClosed exception from Network.HTTP.simpleHTTP -- trying to upload images via XML-RPC with haxr

I'm trying to use haxr 3000.8.5 to upload images to a WordPress blog using the metaWeblog API---specifically, the newMediaObject method.
I've gotten it to work for small images, having successfully uploaded 20x20 icons in both PNG and JPG formats. However, when I try medium-sized images (say, 300x300) I get an ErrorClosed exception, presumably from the HTTP package (I did a bit of source diving, and found that haxr ultimately calls Network.HTTP.simpleHTTP).
Can anyone shed light on the reasons why a call to simpleHTTP might fail with ErrorClosed? Suggestions of things to try and potential workarounds are also welcome.
Here are links to full tcpdump output from a successful upload and from an unsuccessful upload.
The (sanitized) code is also shown below, in case it's of any use.
import Network.XmlRpc.Client (remote)
import Network.XmlRpc.Internals (Value(..), toValue)
import Data.Char (toLower)
import System.FilePath (takeFileName, takeExtension)
import qualified Data.ByteString.Char8 as B
import Data.Functor ((<$>))
uploadMediaObject :: FilePath -> IO Value
uploadMediaObject file = do
media <- mkMediaObject file
remote "http://someblog.wordpress.com/xmlrpc.php" "metaWeblog.newMediaObject"
"default" "username" "password" media
-- Create the required struct representing the image.
mkMediaObject :: FilePath -> IO Value
mkMediaObject filePath = do
bits <- B.unpack <$> B.readFile filePath
return $ ValueStruct
[ ("name", toValue fileName)
, ("type", toValue fileType)
, ("bits", ValueBase64 bits)
]
where
fileName = takeFileName filePath
fileType = case (map toLower . drop 1 . takeExtension) fileName of
"png" -> "image/png"
"jpg" -> "image/jpeg"
"jpeg" -> "image/jpeg"
"gif" -> "image/gif"
main = do
v <- uploadMediaObject "images/puppy.png"
print v
21:59:56.813021 IP 192.168.1.148.39571 > ..http: Flags [.]
22:00:01.922598 IP ..http > 192.168.1.148.39571: Flags [F.]
The connection is closed by the server after a 3-4 sec timeout as the client didn't send any data, to prevent slowloris and similar ddos attacks. (F is the FIN flag, to close one direction of the bidirectional connection).
The server does not wait for the client to close the connection (wait for eof/0 == recv(fd)) but uses the close() syscall; the kernel on the server will respond with [R]eset packets if it receives further data, as you can see at the end of your dump.
I guess the client first opens the http connection and then prepares the data which takes too long.

Resources