Using NGINX to forward tracking data to Flume - nginx

I am working on providing analytics for our web property based on instrumentation data we collect via a simple image beacon. Our data pipeline starts with Flume, and I need the fastest possible way to parse query string parameters, form a simple text message and shove it into Flume.
For performance reasons, I am leaning towards nginx. Since serving static image from memory is already supported, my task is reduced to handling the querystring and forwarding a message to Flume. Hence, the question:
What is the simplest reliable way to integrate nginx with Flume? I am thinking about using syslog (Flume supports syslog listeners), but I struggle with how to configure nginx to forward custom log messages to a syslog (or just TCP) listener running on a remote server and on a custom port. Is it possible with existing 3rd party modules for nginx or would I have to write my own?
Separately, anything existing you can recommend for writing a fast $args parser would be much appreciated.
If you think I am on a completely wrong path and can recommend something better performance-wise, feel free to let me know.
Thanks in advance!

You should parse nginx log file like tail -f do and then pass results to Flume. It will be the most simple and reliable way. The problem with syslog is that it blocks nginx and may completely stuck under high-load or if something goes wrong (this is why nginx doesn't support it).

Related

AMQP/RabbitMQ consumer on NGINX

Is it possible to have RabbitMQ Consumer listening to a queue for message via AMQP protocol. I am aware that nginx only supports HTTP/s protocol. Was wondering if this can be achieved by using tcp module extension.
I am using nginx as API Gateway and want to do a protocol translation from AMQP to HTTP since all the backend service's are exposed on HTTP.
It would definitely be possible writing your own C extension. nginx is suitable for TCP proxying, therefore I don't see any reason why you couldn't send your own TCP packets to RabbitMQ using nginx, and consequently use nginx as a RabbitMQ consumer. It's probably a lot of work to make it run, and even more work to make it stable and reliable, but doable. Do me a favor though, don't do this. There will always be better, more elegant and simpler solution.
HTTP is definitely not suitable for consuming from a queue (in the amqp sense) because you have to keep the socket open while you consume. However, you could write a C extension to publish/retrieve messages to/from RabbitMQ (and apparently, somebody has already done this). If you're not that much into C or don't want to maintain your own nginx package, you could also write a LUA extension for lua-nginx-module (once again, somebody seems to have worked in this direction). These are PoC for talking to MQ from nginx, but they are not consumers. Both extensions seems to act in the HTTP context, so you need to answer (and close the socket) pretty fast.
However, as far as I know, there isn't any community-driven and well maintained project that would serve this purpose directly or indirectly; you'd have to make and maintain your own extension/client. Moreover, nginx is your current API gateway. Do take the risk into account. Things could go really wrong. Only you can tell whether it is worth the hassle or not, but it's most likely not.
Since you don't gave that much information on what you're exactly looking for, I just answered you on the NGINX/AMQP part. But you might just be looking for an HTTP interface for RabbitMQ. In this case, the Management Plugin might be the way to go. It has a pretty cool HTTP API. Once again, you'd loose every stateful features (like basic consuming, ack/nack/rejects), but that's inherently due to the way HTTP is designed.
Eventually, if you really need a RabbitMQ "basic-"consumer, I would recommend you to write a proper consumer as a separate application and forget about doing this in nginx. That's definitely the best and most supported solution.

Serving two websites written in Google Go within a single VM

I have a VM from Digital Ocean. It currently has two domains linked to the VM.
I do not use any other web server but Golang's built in http module. Performance-wise I like it, and I feel like I have a full control over it.
Currently I am using a single Go program that has multiple websites built in.
http.HandleFunc("test.com/", serveTest)
http.HandleFunc("123.com/", serve123)
http.HandleFunc("/", serve123)
As they are websites, Go program is using port 80 for that.
And the problem is when I am trying to update only 1 website, I have to recompile whole thing as they are written in the same code.
1) Is there a way to make it hot-swappable only with Golang (without Nginx or Apache)
2) What would be a standard best practice?
Thank you so much!
Well, you can do hotswapping in go, but I really wouldn't want to do that unless really ncecessary as the complexity added isn't negligible (and I'm not talking about code).
You can have something close with a kind of proxy that would sit in front of the program and do a graceful swap whenever your binary change : the principle is to have the binary on one port, the proxy on another. When a new binary is ready, you run it on another port, and make the proxy redirect to the new port, then gracefully shutdown the old one.
There was a tool for that in Go that I can't remember the name of…
EDIT: not the one I had in mind, but close call https://github.com/rcrowley/goagain
Personnal advice: use a reverse proxy for that, its much more simple to do. My personnal setup is to use h2o to terminate SSL, HTTP2, etc, and send the requests to the various websites running on the background. Not only Go ones, though, but also PHP ones, a Gitlab instance, etc. Its much more flexible, and the performance penalty of the proxy is small…

How to configure GWAN as a reverse proxy?

I saw some performance of GWAN and interested in testing it as a reverse proxy of static content in front of Apache with APC for optimizing PHP opcode, to run a Wordpress multisite. I can get GWAN up and running but I have no idea how to configure it for reverse proxy, as there seems to be almost no information on it. Anyone use GWAN as a reverse proxy?
It's still a not documented feature ... maybe in a next release ? #gil ?
Right now there's no easy way for you to do that. That will change with the next release.
We first hardcoded the reverse-proxy feature in G-WAN along with the load-balancer. Then, as we needed to personalize reverse-proxying, we implemented it as a protocol handler script.
Protocol handler scripts allow users to implement any protocol (like SMTP, LDAP, etc.) without haveing to deal with multi-threading nor socket events.
But finally, to reduce complexity for users, we might revert to the hard-coded implementation with connection handlers scripts to let people personalize the reverse proxy.
It's maturing under different use cases, hence the delay in publicly releasing this feature and a few others.
Rushing to implement features and interfaces is not always optimal, if the goal is to stay flexible and easy to use.

Proxys for WebDAV

I'd like to set up a reverse proxy for my webdav server. The main reason for this is so that I can better control which files are being uploaded to the webdav server. I cannot do this at the webdav server itself, it's a service by alfresco and I have now idea whether or not it's possible to configure the webdav service at all.
In particular I'd like to prevent my mac to do the AppleDouble thingy on the webdav server, i.e. stop my mac from uploading ._* files for every real file I upload. There is as far as I know no way to stop my mac from attempting this.
Does the proxy server need to know more than merely relaying http requests back and forth, does it also need to know something about webdav in order for this to work?
Which proxy servers could your recommend for this?
Günther
Unless I'm missing something, a reverse proxy will have to rewrite header fields (such as Destination: and If:) to work properly and potentially even request/response bodies, and thus is unlikely to work well.
A "proper" proxy shouldn't get in the way, though.
You could do this with SabreDAV. It has a TemporaryFileFilter Plugin that does exactly what you need. Not only does it intercept these resource forks, it also places them in a temporary 'quarantine'. This is important, because OS/X will check if the file was successfully written and fail horribly otherwise.
There will be two things you still need to do to make this work though:
Automatic cleanup of these files (a script suitable for cron is also supplied).
The actual proxy bit. This means you'll have to implement a Collection and a File class that perform the HTTP requests.
Disclaimer: I authored SabreDAV

Node.JS: Converting tcp to stdin/stdout

Node.JS seems limited in its ability to live-update code and in its ability to automatically isolate exceptions. Both of which are practically by default in Java.
One very effective way to live-update is to have a listener process that simply echos communication to/from the child process. Then to update, the listener starts up a new child (which reads the updated code automatically) and then starts sending requests to the new child,, ending the old child when all requests are complete.
Is there already a system that provides this http functionality through stdout/stdin.
Is there a system that provides TCP server or UDP server functionaility through stdout/stdin.
By this I mean, providing a module that looks like the http or net module with the exception that it uses stdout/stdin for the underlying I/O.
Similar to This CGI module
some applications will only have to change require('http') to require('cgi')
I intend to do something similar. I hope to re-use code if it is already out there, and also to easily convert a small or single purpose webserver, into this listener layer which runs many webapps. It is important that cleanup occurs properly. Connections that end or error should be freed up and the end/error events/commands should be properly echoed both ways.
(I believe a common way is to have the children listen on ports and the parent communicate with those ports, but I think an stdout/stdin solution will be more efficient)
Use nginx (HttpUpstreamModule) or HAProxy. In both cases you'd run them in front and mark a backend as down and then bring it back up when you need to do a live upgrade.
I'm not certain that this is what you're looking for (indeed, I'm not certain that I understand your question), but Remy Sharp has written a very helpful node module called nodemon. It promises to "monitor for any changes in your node.js application and automatically restart the server." This may help with the issue of live updating code.

Resources