ngx lua: scope of local variable, init in init_by_lua_block - nginx

I'm new to nginx lua, and got a setup from previous developer. Trying to go through the docs to understand the scope but I'm pretty unsure.
It's like this right now
init_by_lua_block {
my_module = require 'my_module'
my_module.load_data()
}
location / {
content_by_lua_block {
my_module.use_data()
}
}
And in my_module
local _M = {}
local content = {}
function _M.use_data()
-- access content variable
end
function _M.load_data()
-- code to load json data into content variable
end
return _M
So my understand is, content is a local variable, so its lifetime is within each request. However, it's being initialized in init_by_lua_block, and is being used by other local functions, which makes me confused. Is this a good practice? And what's the actual lifetime of this content variable?
Thanks a lot for reading.

Found this: https://github.com/openresty/lua-nginx-module#data-sharing-within-an-nginx-worker
To globally share data among all the requests handled by the same nginx worker process, encapsulate the shared data into a Lua module, use the Lua require builtin to import the module, and then manipulate the shared data in Lua. This works because required Lua modules are loaded only once and all coroutines will share the same copy of the module (both its code and data). Note however that Lua global variables (note, not module-level variables) WILL NOT persist between requests because of the one-coroutine-per-request isolation design.
Here is a complete small example:
-- mydata.lua
local _M = {}
local data = {
dog = 3,
cat = 4,
pig = 5,
}
function _M.get_age(name)
return data[name]
end
return _M
and then accessing it from nginx.conf:
location /lua {
content_by_lua_block {
local mydata = require "mydata"
ngx.say(mydata.get_age("dog"))
}
}

init_by_lua[_block] runs at nginx-loading-config phase, before forking worker process.
so the content variable is global, it's all the same in every request.
https://github.com/openresty/lua-nginx-module/#init_by_lua

Related

Lua - Download file asynchronously via HTTP

I just finish reading copas core code. And I want to write code to download file from website asynchronously, but copas seems to only support socket IO.
Since Lua does not provide async syntax, and other packages will surely have their own event loop that, I think, can not run along side copas' loop.
So to async download file via http, do I have to find a package that suppprt async http and async file IO at the same time? Or any other ideas?
After reading bunches of code, I can finally answer my own question.
As I mention in my comment to the question, one can make use of the step function exported by async IO library, and merge multiple stepping into a bigger loop.
In the case of luv, it uses external thread pool in C to manage file IO, and use a single-threaded loop to call pending callbacks and manage IO polling (polling is not needed in my use case).
One can simply call file operation function provided by luv to make async file IO. But still need to step luv's loop to call callbacks bind to IO operations.
The integerated main loop looks goes like this:
local function main_loop()
copas.running = true
while not copas.finished() or uv.loop_alive() do
if not copas.finished() then
copas.step()
end
if uv.loop_alive() then
uv.run("nowait")
end
end
end
copas.step() is the stepping function of copas. And uv.run("nowait") make luv run just one pass of event loop and don't block if there is no ready IO when polling.
A working solution looks like this:
local copas = require "copas"
local http = require "copas.http"
local uv = require "luv"
local urls = {
"http://example.com",
"http://example.com"
}
local function main_loop()
copas.running = true
while not copas.finished() or uv.loop_alive() do
if not copas.finished() then
copas.step()
end
if uv.loop_alive() then
uv.run("nowait")
end
end
end
local function write_file(file_path, data)
-- ** call to luv async file IO **
uv.fs_open(file_path, "w+", 438, function(err, fd)
assert(not err, err)
uv.fs_write(fd, data, nil, function(err_o, _)
assert(not err_o, err_o)
uv.fs_close(fd, function(err_c)
assert(not err_c, err_c)
print("finished:", file_path)
end)
end)
end)
end
local function dl_url(url)
local content, _, _, _ = http.request(url)
write_file("foo.txt", content)
end
-- adding task to copas' loop
for _, url in ipairs(urls) do
copas.addthread(dl_url, url)
end
main_loop()

Reading JS library from CDN within Mirth

I'm doing some testing around Mirth-Connect. I have a test channel that the datatypes are Raw for the source and one destination. The destination is not doing anything right now. In the source, the connector type is JavaScript Reader, and the code is doing the following...
var url = new java.net.URL('https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.15/lodash.fp.min.js');
var conn = url.openConnection();
conn.setRequestMethod('GET');
if(conn.getResponseCode() === 200) {
var body = org.apache.commons.io.IOUtils.toString(conn.getInputStream(), 'UTF-8');
logger.debug('CONTENT: ' + body);
globalMap.put('_', body);
}
conn.disconnect();
// This code is in source but also tested in destination
logger.debug('FROM GLOBAL: ' + $('_')); // library was found
var arr = [1, 2, 3, 4];
var _ = $('_');
var newArr = _.chunk(arr, 2);
The error I'm getting is: TypeError: Cannot find function chunk in object.
The reason I want to do this is to build custom/internal libraries with unit test and serve them with an internal/company CDN and allow Mirth to consume them.
How can I make the library available to Mirth?
Rhino actually has commonjs support, but mirth doesn't have it enabled by default. Here's how you can use it in your channel.
channel deploy script
with (JavaImporter(
org.mozilla.javascript.Context,
org.mozilla.javascript.commonjs.module.Require,
org.mozilla.javascript.commonjs.module.provider.SoftCachingModuleScriptProvider,
org.mozilla.javascript.commonjs.module.provider.UrlModuleSourceProvider,
java.net.URI
)) {
var require = new Require(
Context.getCurrentContext(),
this,
new SoftCachingModuleScriptProvider(new UrlModuleSourceProvider([
// Search path. You can add multiple URIs to this array
new URI('https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.15/')
],null)),
null,
null,
true
);
} // end JavaImporter
var _ = require('lodash.min');
require('lodash.fp.min')(_); // convert lodash to fp
$gc('_', _);
Note: There's something funky with the cdnjs lodash fp packages that don't detect the environment correctly and force that weird two stage import. If you use https://cdn.jsdelivr.net/npm/lodash#4.17.15/ instead you only need to do var _ = require('fp'); and it loads everything in one step.
transformer
var _ = $gc('_');
logger.info(JSON.stringify(_.chunk(2)([1,2,3,4])));
Note: This is the correct way to use fp/chunk. In your OP you were calling with the standard chunk syntax.
Additional Commentary
I think it's probably ok to do it this way where you download the library once at deploy time and store it in the globalChannelMap, then retrieve it from the map where needed. It would probably also work to store the require object itself in the map if you wanted to call it elsewhere. It will cache and reuse the object created for future calls to the same resource.
I would not create new Require objects anywhere but the deploy script, or you will be redownloading the resource on every message (or every poll in the case of a Javascript Reader.)
Edit: I guess for an internal webhost, this could be desirable in a Javascript Reader if you intend for it to pick up changes immediately on the next poll without a redeploy, assuming you would be upgrading the library in place instead of incrementing a version
The benefit to using Code Templates, as Vibin suggested is that they get compiled directly into your channel at deploy time and there is no additional fetching step at runtime. Making the library available is as simple as assigning it to your channel.
Even though importing third party libraries could be an option, I was actually looking into this for our team to write our own custom functions, write unit-test for them, and lastly be able to pull that code inside Mirth. I was experimenting with lodash but it was not my end goal to use it, it is. My solution was to do a REST GET call with java in the global script. Your URL would be the GitHub raw URL of the code you want to pull in. Same code of my original question but like I said, the URL is the raw GitHub URL for the function I want to pull in.

Send a large file with HTTP.jl

I would like to implement a server with HTTP.jl and julia. After some computation the server would return a "large" file (about several 100 MB). I would like to avoid having to read all the file in memory and then send it to the client.
Some framework allow have a specific function for this (e.g. Flask http://flask.pocoo.org/docs/0.12/api/#flask.send_file) or allow to stream the content to the client (http://flask.pocoo.org/docs/0.12/patterns/streaming/).
Are one for these two options also available in HTTP.jl ? Or any other Julia web package?
Here is a test code which reads the file testfile.txt, but I want to avoid loading the complete file in memory.
import HTTP
f = open("testfile.txt","w")
write(f,"test")
close(f)
router = HTTP.Router()
function testfun(req::HTTP.Request)
f = open("testfile.txt")
data = read(f)
close(f)
return HTTP.Response(200,data)
end
HTTP.register!(router, "GET", "/testfun",HTTP.HandlerFunction(testfun))
server = HTTP.Servers.Server(router)
task = #async HTTP.serve(server, ip"127.0.0.1", 8000; verbose=false)
sleep(1.0)
req = HTTP.request("GET","http://127.0.0.1:8000/testfun/")
# end server
put!(server.in, HTTP.Servers.KILL)
#show String(req.body)
You can use memory mapped IO like this:
function testfun(req::HTTP.Request)
data = Mmap.mmap(open("testfile.txt"), Array{UInt8,1})
return HTTP.Response(200,data)
end
data now looks like a normal byte array to julia, but is actually liked to the file, which might be exactly what you want. The file will be closed upon garbage collection - if you have many requests and no garbage collection is triggered, you might end up with a lot of open files. If your request takes quite long anyway, you might consider calling gc() at the begin of the request.

How to send erlang functions source to riak mapreduce via HTTP?

I'm trying to use Riak's mapreduce via http. his is what i'm sending:
{
"inputs":{
"bucket":"test",
"key_filters":[["matches", ".*"]]
},
"query":[
{
"map":{
"language":"erlang",
"source":"value(RiakObject, _KeyData, _Arg) -> Key = riak_object:key(RiakObject), Count = riak_kv_crdt:value(RiakObject, <<\"riak_kv_pncounter\">>), [ {Key, Count} ]."
}
}
]}
Riak fails with "[worker_startup_failed]", which isn't very informative. Could anyone please help me get this to actually execute the function?
WARNING
Allowing arbitrary Erlang functions via map-reduce is a security risk. Any valid Erlang can be executed, including sending your entire data set offsite or formatting the hard drive.
You have been warned.
However, if you implicitly trust any client that may connect to your cluster, you can allow Erlang source to be passed in a map-reduce request by setting {allow_strfun, true} in the riak_kv section of app.config, (or in the advanced.config if you are using riak.conf).
Once you have allowed passing an Erlang function in a map-reduce phase, you need to pass in a function of the form fun(RiakObject,KeyData,Arg) -> [result] end. Note that this must be an anonymous fun, so fun is a keyword, not a name, and it must end with end.
Your function should handle the case where {error,notfound} is passed as the first argument instead of an object. Simply adding a catch-all clause to the function could accomplish that.
Perhaps something like:
{
"inputs":{
"bucket":"test",
"key_filters":[["matches", ".*"]]
},
"query":[
{
"map":{
"language":"erlang",
"source":"fun(RiakObject, _KeyData, _Arg) ->
Key = riak_object:key(RiakObject),
Count = riak_kv_crdt:value(
RiakObject,
<<\"riak_kv_pncounter\">>),
[ {Key, Count} ];
(_,_,_) -> [{error,0}]
end."
}
}
]}
Allowing the source to be passed in the request is very useful while developing and debugging. For production, you really should put the functions in a dedicated pre-compiled module that you copy to the code path of each node so that the phase spec can specify the module and function by name instead of providing arbitrary code.
{"map":{
"language":"erlang",
"module":"yourprecompiledmodule",
"function":"functionname"}}
You need to enable allow_strfun on all nodes in your cluster. To do so in Riak 2, you will need to use the advanced.config file to add this to the riak_kv configuration:
[
{riak_kv, [
{allow_strfun, true}
]}
].
The other option is to create your own Erlang module by using the compiler shipped with Riak and placing the *.beam file in a well-known location for Riak to find. The basho-patches directory is one such place.
Please see the documentation as well:
advanced.config
Installing custom Erlang code
HTTP MapReduce
Using MapReduce
Advanced MapReduce
MapReduce / curl example

Sharing data with blocks

I have a page that displays some data. The source of the data is not Drupal nodes, so Views is of no use me:
function mymodule_main_page($arg1, $arg2, $arg3) {
$results = call_remote_api_and_get_lots_of_results($arg1, $arg2, $arg3);
return theme('mymodule_page', $results, $arg1, $arg2, $arg3);
}
My module also displays a block. The block purpose is to summarize the the results that were returned in the main page content (eg: Number of results: X, Number of pages: Y, etc)
/**
* Implementation of hook_block().
*/
function mymodule_block($op = 'list', $delta = 0, $edit = array()) {
switch ($op) {
case 'view':
if ($delta == 0) {
$block['subject'] = t('Results summary');
$block['content'] = theme('mymodule_results_summary');
}
break;
}
}
I need to avoid generating the results again. What is the best way for my block to access the results object returned in the function that drew the main page? Global or Static vars? Is there a module that exists that already attempts to solve this problem?
Very good and flexible solution is using drupal core functions cache_set and cache_get as ya.teck mentioned but extend its functionality with cacherouter module. You can specify cache storage engines and use memcache or shared memory for you cache. It doesn't use database for storing data and very fast.
In addition to the cache system that ya.teck mentions, a more simple way is to cache the entire block for x mins, hours, days. Drupal has a built in cache system for all blocks. You can see some of the settings at admin/settings/performance
Update:
The drupal way both core and contrib is to use a static variable an array or the actual variable and store the heavy lifting there. An example could be node_load, which stores all of the loaded nodes in an array so each node only needs to be loaded once during each request.
You may store your data by drupal cache system.
See cache_set and cache_get functions for more information.

Resources