HTTP Client in DoFn - http

I would like to make POST request through a DoFn for a Apache Beam Pipeline running on Dataflow.
For that, I have created a client which instanciate an HttpClosableClient configured on a PoolingHttpClientConnectionManager.
However, I instanciate a client for each element that I process.
How could I setup a persistent client used by all my elements?
And is there other class for parallel and high-speed HTTP requests that I should use?

You can put the client into a member variable, use the #Setup method to open it, and #Teardown to close it. Implementation of almost all IOs in Beam uses this pattern, e.g. see JdbcIO.

Related

SpringIntegration and Reactive: Trying to understand the constraints

We have a SpringIntegration workflow with restful HTTP inbound calls and outbound calls. The workflow is mostly expressed with XML declarations of channels, chains, a splitter and an aggregator.
In the Servlet realm, we use the http:inbound-gateway and http:outbound-gateway components for input/output to the internal workflow. This seems to work well using SpringBoot autoconfiguration for Tomcat/Jetty/Undertow.
We've been trying the Reactive realm, using webflux:inbound-gateway and webflux:outbound-gateway components on the same internal workflow. This seems to work OK for tomcat and jetty servers but getting no responses from netty and some errors from undertow. I have yet to discover why we are getting errors from the last two configurations.
What I'm wondering is if the same internal workflow can be hooked up to reactive or servlet components without requiring changes. We do use a splitter/aggregator, and my reading of the SpringIntegration documentation sections on WebFlux hasn't quite cleared up for me if these constructs can be used in both realms. ( https://docs.spring.io/spring-integration/reference/html/reactive-streams.html#splitter-and-aggregator )
Any pointers on this subject?
The webflux:inbound-gateway is a server side of HTTP protocol. Has to be used in the Reactive Streams HTTP server environment. Not sure about Undertow and Jetty, but Tomcat works in the simulating mode. I usually use an io.projectreactor.netty:reactor-netty-http.
The webflux:outbound-gateway is a client side of HTTP protocol. It is fully based on the WebClient and doesn't matter in what environment it is used.
same applies for a splitter and aggregator components: they don't require any server implementation and they don't expose any external ports to worry about some specifics. Can simply be used in the reactive stream definition and in regular flows.

Understanding websockets in terms of REST and Server vs Client Events

For a while now I have been implementing a RESTful API in the design of my project because in my case it is very useful for others to be able to interact with the data in a consistent format (and I find REST to be a clean way of handling requests). I am now trying to not only have my current REST API for my resources, but the ability to expose some pieces of information via a bidirectional websocket connection.
Upon searching for a good .net library to use that implements the websocket protocol, I did find out about SignalR. There was a few problems I had with it (maybe specific to my project?)
I want to be able to initialize a web socket connection through my
existing REST API. (I don't know the proper practice to do this, but
I figured a custom header would work fine) I would like them (the
client) to be able to close the connection and get a http response
back (101?) to signify its completion.
The problem I had with SignalR was:
that there was no clean way outside of a hub instance to get a user's connection id and map it to a external controller where the rest call made affects what piece of data gets broadcasted to the specific client (I don't want to use external memory)
the huge reliance on client side code. I really want to make this process as simple to the client and handle the majority of the work on the server side (which I had hoped modifying my current rest api would accomplish). The only responsibility I see of a client is to disconnect peacefully.
So now the question..
Is there a good server side websocket library for .net that implements the latest web socket protocol? The client can use any client library that adheres to the protocol. What is the best practice to incorporate both web socket connections and a restful api?
ASP.NET supports WebSockets itself if you have IIS8 (only Windows 8/2012 and further). SignalR is just a polyfill,
If you do not have IIS8, you can use external WebSocket frameworks like mine: http://vtortola.github.io/WebSocketListener/
Cheers.

Rebus HTTP gateway and MSMQ health state

Let's say we have
Client node with HTTP gateway outbound service
Server node with HTTP gateway inbound service
I consider situation where MSMQ itself stops from some reason on the client node. In current implementation Rebus HTTP gateway will catch the exception.
What do you think about idea that instead of just catching, the MessageQueueException exception could be also sent to server node and put on error queue? (name of error queue could be gathered from headers)
So without additional infrastructure server would know that client has a problem so someone could react.
UPDATE:
I guessed problems described in the answer would be raised. I should have explained my scenario deeper :) Sorry about it. Here it is:
I'm going to modify HTTP gateway in the way that InboundService would be able to do both - Send and Receive messages. So the OutboundService would be the only one who initiate the connection(periodically e.g. once per 5 minutes) in order to get new messages from server and send its messages to server. That is because client node is not considered as a server but as a one of many clients which are behind the NAT.
Indeed, server itself is not interested in client health but I though that instead of creating separate alerting service on client side which would use HTTP gateway HTTP gateway code, the HTTP gateway itelf could do this since it's quite in business of HTTP gateway to have both sides running.
What if the client can't reach the server at all?
Since MSMQ would be dead I thought about using in-process standalone persistent queue object like that http://ayende.com/blog/4540/building-a-managed-persistent-transactional-queue
(just an example implementation, I'm not sure what kind of license it has)
to aggregate exceptions on client side until server is reachable.
And how often will the client notify the server that is has experienced an error?
I'm not sure about that part - I thought it could be related to scheduled time of message synchronization like once per 5 minutes but what in case there would be no scheduled time just like in current implementation (while(true) loop)? Maybe it could be just set by config?
I like to have a consistent strategy about handling errors which usually involves plain old NLog logging
Since client nodes will be in the Internet behind the NAT standard monitoring techniques won't work. I thought about using queue as NLog transport but since MSMQ would be dead it wouldn't work.
I also thought about using HTTP as NLog transport but on the server side it would require queue (not really, but I would like to store it in queue) so we are back to sbus and HTTP gateway...that kind of NLog transport would be de facto clone of HTTP gateway.
UPDATE2: HTTP as NLog transport (by transport I mean target) would also require client side queue like I described in "What if the client can't reach the server at all?" section. It would be clone of HTTP gateway embedded into NLog. Madness :)
All the thing is that client is unreliable so I want to have all the information about client on the server side and log it in there.
UPDATE3
Alternative solution could be creating separate service, which would however be part of HTTP gateway (e.g. OutboundAlertService). Then three goals would be fulfilled:
shared sending loop code
no additional server infrastructure required
no negative impact on OutboundService (no complexity of adding in-process queue to it)
It wouldn't take exceptions from OutboundService but instead it would check MSMQ perodically itself.
Yet other alternative solution would be simply using other than MSMQ queue as NLog target but that's ugly overkill.
Regarding your scenario, my initial thought is that it should never be the server's problem that a client has a problem, so I probably wouldn't send a message to the server when the client fails.
As I see it, there would be multiple problems/obstacles/challenges with that approach because, e.g. what if the client can't reach the server at all? And how often will the client notify the server that is has experienced an error?
Of course I don't know the details of your setup, so it's hard to give specific advice, but in general I like to have a consistent strategy about handling errors which usually involves plain old NLog logging and configuring WARN and ERROR levels to go the Windows Event Log.
This allows for setting up various tools (like e.g. Service Center Operations Manager or similar) to monitor all of your machines' event logs to raise error flags when someting goes wrong.
I hope I've said something you can use :)
UPDATE
After thinking about it some more, I think I'm beginning to understand your problem, and I think that I would prefer a solution where the client lets the HTTP listener in the other end know that it's having a problem, and then the HTTP listener in the other end could (maybe?) log that as an error.
Another option is that the HTTP listener in the other end could have an event, ReceivedClientError or something, that one could attach to and then do whatever is right in the given situation.
In your case, you might put a message in an error queue. I would just avoid putting anything in the error queue as a general solution because I think it confuses the purpose of the error queue - the "thing" in the error queue wouldn't be a message, and as such it would not be retryable etc.

How can a Pinoccio lead scout make a POST request to a remote server?

I'd like my Pinocc.io lead scout to make a POST request (e.g. to inform a remote service of an event that has been triggered).
Note that I don't want to listen to a constant stream the results (as detailed here) as I don't want to be constantly connected to the HQ (I'm going to enable the wi-fi connection only when required to minimize battery usage), and the events I'm detecting are infrequent.
I would have thought that this is a very common use case, yet I can find no examples of the lead scout POSTing any messages.
I posted the same message directly on the Pinoccio website and I got this answer from an Admin
Out of the gate, that's not supported via HQ. Mainly because to get as
real-time performance between API/HQ and a Lead Scout, it makes most
sense to leave a TCP socket open continually, and transfer data that
way. HTTP, as you know, requires a connection, setup, transfer, and
teardown upon each request.
However, doesn't mean you can't get it
working. In fact, you can do both if you wanted—leave the main TCP
socket connected to HQ, and have a separate TCP client socket connect
to any site/server you want and send whatever you like. It will
require a custom Bootstrap, but you can then expose any aspect of that
functionality to HQ/ScoutScript directly.
If you take a look at this code, that's the client object you'd use to open an HTTP connection.
So in a nutshell the lead scout cannot make a POST request. To do so you'll need to create a custom bootstrap (e.g. using the Arduino IDE).

implementing a background process responding to the client in an atmosphere+netty/jetty application

We have a requirement to to support 10k+ users, where every user initiate a request and waits for a response from the server (the response can take as long as 20-30 seconds to arrive). it is only one request from the client, and after a long processing by the server, a response will be transmitted and then the connection will disconnect.
in the background, the server will do some DB search and wait for other background processes to notify on completion before responding to the client.
after doing some research i figured out we will need to use something like the atmosphere framework to support websockets/sse event/long polling along with an asynchronous server like netty (=> nettosphere) or jetty.
As for my experience - mostly Java EE world and Tomcat server.
my questions are:
what will be easier to implement in regard to my experience and our requirement: atmosphere + netty or atmoshphere+jetty? which one can scale better, has an easier learning curve and easier to implement other java technologies?
how do u implement in atmosphere a response that is sent only to the originating client and not broadcast to the rest of the clients? (all the examples i found are broadcast).
how can i implement in netty (or jetty) when using the atmosphere framework our response? i.e., the client send a request, after it is received in the server some background processes are run, and when they finish i need to locate the connection and transmit the response. is that achievable?
Some thoughts:
At 10k+ users, with 20-30 second response latency, you likely hit file descriptor limits if using just 1 network interface. Consider a solution that uses multiple network interfaces.
Your description of your request/response can be handled entirely with standard Servlet 3.0, standard HTTP/1.1, Async request handling, and large timeouts.
If your clients are web browsers, and you don't start sending a response from the server until the 20-30 second window, you might hit browser idle timeouts.
Atmosphere and Cometd do the same things, supporting long duration connections, with connection technique fallbacks, and with logical channel APIs.
I believe the AKKA framework will handle this sort of need. I am looking at using it to handle scaling issues possibly with a RabbitMQ to help off load work to potentially other servers that may be added later to scale as needed.

Resources