Multiple http requests on the same long running operation with Sinatra and EventMachine - asynchronous

I'm trying to understand how to use evented web servers with a combination of async sinatra and EventMachine.
In the code below each request on '/' will generate a new async http request to google. Is there an elegant solution for detecting that a request is already ongoing and waiting for its execution ?
If I have 100 concurrent requests on '/', this will generate 100 requests to the google backend. It would be much better to have a way to detect there is already an ongoing backend request and wait for its execution.
thanks for the answer.
require 'sinatra'
require 'json'
require 'eventmachine'
require 'em-http-request'
require 'sinatra/async'
Sinatra.register Sinatra::Async
def get_data
puts "Start request"
http = EventMachine::HttpRequest.new("http://www.google.com").get
http.callback {
puts "Request completed"
yield http.response
}
end
aget '/' do
get_data {|data| body data}
end
Update
I actually discovered you can add several callbacks to the same http request. So, it's easy to implement:
class Request
def get_data
if !#http || #http.response_header.status != 0
#puts "Creating new request"
#http = EventMachine::HttpRequest.new("http://www.bbc.com").get
end
#puts "Adding callback"
#http.callback do
#puts "Request completed"
yield #http.response
end
end
end
$req = Request.new
aget '/' do
$req.get_data {|data| body data}
end
This gives a very high number of requests per second. Cool!

You don't have to use sinatra/async at all to make it evented, just run it with an evented server (Thin, Rainbows!, Goliath).
Take a look at em-synchrony for an example of making multiple parallel requests without introducing spaghetti callback code:
require "em-synchrony"
require "em-synchrony/em-http"
EventMachine.synchrony do
multi = EventMachine::Synchrony::Multi.new
multi.add :a, EventMachine::HttpRequest.new("http://www.postrank.com").aget
multi.add :b, EventMachine::HttpRequest.new("http://www.postrank.com").apost
res = multi.perform
p "Look ma, no callbacks, and parallel HTTP requests!"
p res
EventMachine.stop
end
And yes, you can run this inside your Sinatra action.
Also take a look at Faraday, specifically with EM adapter.

Related

Elixir - How do I manage routes using Cowboy (and nothing else)?

I'm restricted to only using Cowboy for a web server that handles a JSON REST API. I need to be able to use only Cowboy + whatever the language capabilities are to manage and process different and variable routes, as well as using the GET values.
I'm getting the path as explained in the following routine:
def handle(req, router) do
headers = [{"content-type", "application/json"}]
{path, req} = :cowboy_req.path(req)
{:ok, resp} = :cowboy_req.reply(200, headers, router.call(path), req)
{:ok, resp, router}
end
And ultimately route.call(path) calls the following:
defp serve("/call/[:thing]") do
list = [path: "oy"]
IO.puts :thing
{status, result} = JSON.encode(list)
result
end
By itself, serve("/call") returns the JSON without issues, but trying to request any other route under /call to the server, makes it answer with the 404 response (already handled by me).
What's the best approach when handling these dynamic routes? Bear in mind that I'm delimited to only using Cowboy and nothing else.
Your code is not very clear - how did you start the server? More specifically, how did you setup your router? This seems to be the problem here, I'm guessing you made a route only for /call.
You'd need something like this:
dispatch_config = :cowboy_router.compile([{:_, [{"/call/[:thing]", YourHandlerModule, []}]}])
{ :ok, _ } = :cowboy.start_http(:http,
100,
[{:port, 8080}],
[{ :env, [{:dispatch, dispatch_config}]}]
)
The path /call/[:thing] should be specified at the router, not inside your handler.
I found one simple solution by only using Cowboy and Elixir:
def call(conn) do
serve(conn.req_path, conn)
end
defp serve(<< "/call/", name::binary >>, conn) do
list = conn.req_qs
IO.puts name
{_, result} = JSON.encode(list)
put_resp_body(conn, result)
If you do it like this, all subsequent routes stay in rest. A simple split would do. conn carries the query string, so I can get the values from there.

How to reload code when HTTP server is running?

When starting an http server using HTTP.serve there is apparently no way to reload the code that is actually handling the HTTP request.
In the example below I would like to have the modifications in my_httphandler taken into account without having to restart the server.
For the moment I need to stop the server from the REPL by pressing CTRL+C twice and then run the script again.
Is there a workaround ?
module MyModule
using HTTP
using Mux
using JSON
using Sockets
function my_httphandler(req::HTTP.Request)
return HTTP.Response(200, "Hello world")
end
const MY_ROUTER = HTTP.Router()
HTTP.#register(MY_ROUTER, "GET", "/*", my_httphandler)
HTTP.serve(MY_ROUTER, Sockets.localhost, 8081)
end
I'm not sure whether Mux caches handlers. As long as it does not, this should work:
module MyModule
using HTTP
using Mux
using JSON
using Sockets
function my_httphandler(req::HTTP.Request)
return HTTP.Response(200, "Hello world")
end
const functionref = Any[my_httphandler]
const MY_ROUTER = HTTP.Router()
HTTP.#register(MY_ROUTER, "GET", "/*", functionref[1])
HTTP.serve(MY_ROUTER, Sockets.localhost, 8081)
end
function newhandler(req::HTTP.Request)
return HTTP.Response(200, "Hello world 2")
end
MyModule.functionref[1] = newhandler
Revise.jl lets you automatically update code in a live Julia session. You may be especially interested in entr; see Revise's documentation for details.
When using HTTP.jl: just add #async before HTTP.serve
module MyModule
using HTTP
using Sockets
function my_httphandler(req::HTTP.Request)
return HTTP.Response(200, "Hello world")
end
const MY_ROUTER = HTTP.Router()
HTTP.#register(MY_ROUTER, "GET", "/*", my_httphandler)
#async HTTP.serve(MY_ROUTER, Sockets.localhost, 8081)
end # module
When using Mux.jl: nothing to do, the server is started in the background
using Mux
function sayhellotome(name)
return("hello " * name * "!!!")
end
#app test = (
Mux.defaults,
route("/sayhello/:user", req -> begin
sayhellotome(req[:params][:user])
end),
Mux.notfound())
Mux.serve(test, 8082)
I've added a ticket #587 to HTTP.jl project for developer workflow support. I'm not sure this is your use case or not.
# hello.jl -- an example showing how Revise.jl works with HTTP.jl
# julia> using Revise; includet("hello.jl"); serve();
using HTTP
using Sockets
homepage(req::HTTP.Request) =
HTTP.Response(200, "<html><body>Hello World!</body></html>")
const ROUTER = HTTP.Router()
HTTP.#register(ROUTER, "GET", "/", homepage)
serve() = HTTP.listen(request -> begin
Revise.revise()
Base.invokelatest(HTTP.handle, ROUTER, request)
end, Sockets.localhost, 8080, verbose=true)
Alternatively, you could have a test/serve.jl file, that assumes MyModule with a top-level HTTP.jl router is called ROUTER. You'll need to remove the call to serve in your main module.
#!/usr/bin/env julia
using HTTP
using Sockets
using Revise
using MyModule: ROUTER
HTTP.listen(request -> begin
Revise.revise()
Base.invokelatest(HTTP.handle, ROUTER, request)
end, Sockets.localhost, 8080, verbose=true)
A more robust solution would catch errors; however, I had challenges getting this to work and reported my experience at #541 in Revise.jl.

Unable to modify request in middleware using Scrapy

I am in the process of scraping public data regarding metheorology for a project (data science), and in order to effectively do that I need to change the proxy used on my scrapy requests in the event of a 403 response code.
For this, I have defined a download middleware to handle such situation, which is as follows
class ProxyMiddleware(object):
def process_response(self, request, response, spider):
if response.status == 403:
f = open("Proxies.txt")
proxy = random_line(f) # Just returns a random line from the file with a valid structure ("http://IP:port")
new_request = Request(url=request.url)
new_request.meta['proxy'] = proxy
spider.logger.info("[Response 403] Changed proxy to %s" % proxy)
return new_request
return response
After properly adding the class to settings.py, I expected this middleware to deal with 403 responses by generating a new request with the new proxy, hence finishing in a 200 response. The observed behaviour is that it actually gets executed (I can see the Logger info about Changed proxy), but the new request does not seem to be made. Instead, I'm getting this:
2018-12-26 23:33:19 [bot_2] INFO: [Response] Changed proxy to https://154.65.93.126:53281
2018-12-26 23:33:26 [bot_2] INFO: [Response] Changed proxy to https://176.196.84.138:51336
... indefinitely with random proxies, which makes me think that I'm still retrieving 403 errors and the proxy is not changing.
Reading the documentation, regarding process_response, it states:
(...) If it returns a Request object, the middleware chain is halted and the returned request is rescheduled to be downloaded in the future. This is the same behavior as if a request is returned from process_request().
Is it possible that "in the future" is not "right after it is returned"? How should I do to change the proxy for all requests from that moment on?
Scrapy will drop duplicate requests to the same url by default, so that's probably what's happening on your spider. To check if this is your case you can set this settings:
DUPEFILTER_DEBUG=True
LOG_LEVEL='DEBUG'
To solve this you should add dont_filter=True:
new_request = Request(url=request.url, dont_filter=True)
Try this:
class ProxyMiddleware(object):
def process_response(self, request, response, spider):
if response.status == 403:
f = open("Proxies.txt")
proxy = random_line(f)
new_request = Request(url=request.url)
new_request.meta['proxy'] = proxy
spider.logger.info("[Response 403] Changed proxy to %s" % proxy)
return new_request
else:
return response
A better approach would be to use scrapy random proxies module instead:
'DOWNLOADER_MIDDLEWARES' : {
'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
'rotating_proxies.middlewares.BanDetectionMiddleware': 620
},

RSpec: Stub API call that sets global variable

In my Ruby (not Rails) program, I have created global variables in the top-level module. These global variables are set as the clients of external services, so my program makes API calls when they are set. I am trying to figure out how to properly stub these API calls in RSpec.
I would like to test a class inside the top module, that looks more or less like this. Worker does not directly call the global variables anywhere in the class.
module TopModule
class Worker
end
end
Here is the TopModule:
module TopModule
# (As an aside, the external service is AWS)
$client = ExternalService::Client.new(ExternalService.config)
end
I would like to run the RSpec test of TopModule::Worker so it passes:
describe TopModule::Worker do
it 'shows in various ways that Worker functions'
end
However, I get the following error: Real HTTP connections are disabled. Unregistered request: GET http://... with headers {...} (WebMock::NetConnectNotAllowedError)
The stack trace points to the line in TopModule where $client is defined.
I'm also told:
You can stub this request with the following snippet:
stub_request(:get, "http://...").
with(:headers => {'Accept'=>'*/*', 'Accept-Encoding'=>'...', 'User-Agent'=>'Ruby'}).
to_return(:status => 200, :body => "", :headers => {})
I still have the error when I add the stub to my spec/spec_helper RSpec.configure loop. Here are the relevant parts of the the spec_helper:
require 'webmock/rspec'
require 'codeclimate-test-reporter'
WebMock.disable_net_connect!(allow: 'codeclimate.com')
require 'fileutils'
require 'top_module'
Dir['./spec/support/**/*.rb'].sort.each { |f| require f }
RSpec.configure do |config|
config.mock_with :rspec do |mocks|
mocks.verify_doubled_constant_names = true
mocks.verify_partial_doubles = true
end
end
def files_directory
File.dirname(__FILE__) + '/files'
end
Where can I put the stub so it will actually handle the ExternalService API call? I would appreciate your help.
(This code is based on my real code, but not identical)
you can use VCR to stub external api call.
https://github.com/vcr/vcr

Download large file with LuaSocket's HTTP module while keeping UI responsive

I would like to use LuaSocket's HTTP module to download a large file while displaying progress in the console and later on in a GUI. The UI must never block, not even when the server is unresponsive during the transfer. Additionally, creating a worker thread to handle the download is not an option.
Here's what I got so far:
local io = io
local ltn12 = require("ltn12")
local http = require("socket.http")
local fileurl = "http://www.example.com/big_file.zip"
local fileout_path = "big_file.zip"
local file_size = 0
local file_down = 0
-- counter filter used in ltn12
function counter(chunk)
if chunk == nil then
return nil
elseif chunk == "" then
return ""
else
file_down = file_down + #chunk
ui_update(file_size, file_down) -- update ui, run main ui loop etc.
return chunk -- return unmodified chunk
end
end
-- first request
-- determine file size
local r, c, h = http.request {
method = "HEAD",
url = fileurl
}
file_size = h["content-length"]
-- second request
-- download file
r, c, h = http.request {
method = "GET",
url = fileurl,
-- set our chain, count first then write to file
sink = ltn12.sink.chain(
counter,
ltn12.sink.file(io.open(fileout_path, "w"))
)
}
There are a few problems with the above, ignoring error checking and hard-coding:
It requires 2 HTTP requests when it is possible with only 1 (a normal GET request also sends content-length)
If the server is unresponsive, then the UI will also be unresponsive, as the filter only gets called when there is data to process.
How could I do this making sure the UI never blocks?
There is an example on non-preemptive multithreading in Programming in Lua that uses non-blocking luasocket calls and coroutines to do a multiple parallel downloads. It should be possible to apply the same logic to your process to avoid blocking. I can only add that you should consider calling this logic from IDLE event in your GUI (if there is such a thing) to avoid getting "attempt to yield across metamethod/c-call boundary" errors.

Resources