I am working on a issue with the design of a service which is basically redirection.
The request link I get will contain some params (abc.com/param1=v1¶m2=v2).
I need to do two tasks on this link
I need to format the link and redirect user to another domain with
some params passed(xyz.com/p1=v2) depending on the value of ,say,
param1, This step should be as fast as possible
I need to save the link details to my DB after some processing.
I am planning to do this with nginx+lua(openresty)+(Redis or Mongodb?) combination.
As the two are unrelated task I am planning to split it, to do both asynchronously.
As the first task in a redirection, ngx.redirect("/link") seems apt for the case.
But the documentation says redirect call will terminates the processing of the current request
How can I make these two tasks independent and redirection will happen as fast a possible and should not wait for the completion of the second task.
Can I make storing done by another thread and how give this job to another thread?
Yeah of course you easlt can, first of all you have to perfecctly understand the Order of lua module directive, then for make you Mongodb process in a ceperate thread you have to call in with ngx.location.capture($url), where $url is the url in your location block :
location redirect/handling {
... //
content_by_lua_file url/to/your/code/forRedirectHandling
ngx.location.capture(mongo/save):
}
location mongo/save {
content_by_lua_file url/to/mongodbHandlingdCode
}
The ngx.location.capture() will point to your second location block and make your code in another thread (nginx worker).
Pls see the openresty documentation for know wich directive to use (access_by_lua, log_by_lua...)
hope this help :)
Related
The pprof package documentation says
The package is typically only imported for the side effect of registering its HTTP handlers. The handled paths all begin with /debug/pprof/."
The documentation says if you already have an http server running you don't need to start another one but if you are not using DefaultServeMux, you will have to register handlers with the mux you are using.
Shouldn't I always use a separate port for pprof? Is it okay to use the same port that I am using for prometheus metrics?
net/http/pprof is a convenience package. It always registers handlers on DefaultServeMux, because DefaultServeMux is a global variable that it can actually do that with.
If you want to serve pprof results on some other ServeMux there's really nothing to it; all it takes is calling runtime/pprof.StartCPUProfile(w) with an http.ResponseWriter and then sleeping, or calling p.WriteTo(w, debug) on a runtime/pprof.Profile object. You can look at the source of net/http/pprof to see how it does it.
In a slightly better universe, net/http/pprof would have a RegisterHandlers(*http.ServeMux) function that could be used anywhere, you would be able to import it without anything being registered implicitly, and there would be another package (say net/http/pprof/sugar) that did nothing except call pprof.RegisterHandlers(http.DefaultServeMux) in its init. However, we don't live in that universe.
I need to run query with importing modules from pod.
Without importing modules if I run simple query with Database Id using below, it is working.
let $queryParam := fn:concat("?query=",xdmp:url-encode($query),"&eval=",$dataBaseId,":123")
let $url := fn:concat($hostcqport,"/eval.xqy",$queryParam)
let $response := xdmp:http-post($url, $options)[2]
If I have import modules statements then it is throwing Error(File Not Found).
So I tried getting the app-server id and tried passing that instead of database-id as below,
let $queryParam := fn:concat("?query=",xdmp:url-encode($query),"&eval=",$serverId,":123")
let $url := fn:concat($hostcqport,"/eval.xqy",$queryParam)
let $response := xdmp:http-post($url, $options)[2]
How to pass the server-id to make the query executing against particular app-server.
Is this MarkLogic 8 or earlier (I ask because rewrite options on 8 allow for dynamic switching of module databases before execution (among lots of other amazing goodies). This may be what you want because you can look at the query parameters at this point and build logic into the rewite rules.
Otherwise, Can you explain in more detail what you are trying to accomplish in the end. By the time your code ran, it was already executed in the context of a particular App server - so asking to execute against a another app server by analysing the query parameters is a bit too late (because you are already using the app server).
[edit] The following is in response to the comments since provided. This is a messy response because the actual ticket and comments are still not a completely clear picture. But if you stitch them together, then a problem statement does now exist for which I can respond.
The original author of the question confirmed via comments that they are "trying to hit an app server on a different node than the one that you actually posted to"
OK.. This is the response to that clarification:
That is not possible. Your request is already being processed by a thread on the node that you hit with your http request. Marklogic is a cluster, but it does not share threads (or anything else for that matter). Choices are:
a redirect to the proper node
possibly use the current node to make the request on your behalf.
But that ties up the first thread and the thread on the other node and has the HTTP communication overhead - and you need to have an app server listening for this purpose.
If this is a fire-and-forget type of situation, then you can hit any node and save the data/request in a document in the DB using a URI naming convention that indicates what app server it is for, and by way of insert triggers (with a URI-prefix for their server id), pick up the request from the DB and process it.
I have read the example of scrapy-redis but still don't quite understand how to use it.
I have run the spider named dmoz and it works well. But when I start another spider named mycrawler_redis it just got nothing.
Besides I'm quite confused about how the request queue is set. I didn't find any piece of code in the example-project which illustrate the request queue setting.
And if the spiders on different machines want to share the same request queue, how can I get it done? It seems that I should firstly make the slave machine connect to the master machine's redis, but I'm not sure which part to put the relative code in,in the spider.py or I just type it in the command line?
I'm quite new to scrapy-redis and any help would be appreciated !
If the example spider is working and your custom one isn't, there must be something that you have done wrong. Update your question with the code, including all relevant parts, so we can see what went wrong.
Besides I'm quite confused about how the request queue is set. I
didn't find any piece of code in the example-project which illustrate
the request queue setting.
As far as your spider is concerned, this is done by appropriate project settings, for example if you want FIFO:
# Enables scheduling storing requests queue in redis.
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
# Don't cleanup redis queues, allows to pause/resume crawls.
SCHEDULER_PERSIST = True
# Schedule requests using a queue (FIFO).
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderQueue'
As far as the implementation goes, queuing is done via RedisSpider which you must inherit from your spider. You can find the code for enqueuing requests here: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/scheduler.py#L73
As for the connection, you don't need to manually connect to the redis machine, you just specify the host and port information in the settings:
REDIS_HOST = 'localhost'
REDIS_PORT = 6379
And the connection is configured in the Ä‹onnection.py: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/connection.py
The example of usage can be found in several places: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/pipelines.py#L17
I need to create an asynchronous scheduler inside nginx server to update a variable. Let me give you an example what I mean by this and why I need it.
Imagine config file that looks something like this:
http {
lua_shared_dict foo 5m;
server {
location /set {
content_by_lua '
local foo = ngx.shared.foo
ngx.say(foo:get("12345"))
';
}
}
}
I specified variable foo that resides in shared memory and all worker processes have access to it. What I want to do is to set those values from lua script that will be called every minite. Just for reference it will be going to the Redis and then retrieve necessary data, and update this variable. I know I can do this in content_by_lua in every call, but it's highly inefficient for a huge volume of traffic.
I would like a separate process that would be triggered every minute or so to just go and one task. Is there anything like this in nginx or are there any modules that could help me with that?
You can use the new ngx.timer API provided by ngx_lua. See the documentation for details:
https://www.nginx.com/resources/wiki/modules/lua/#ngx-timer-at
You can create a new timer in your timer handler to make the timer keeps triggering like a cronjob ;)
BTW, the timer is per-worker process, you can use the lua-resty-lock library in your timer handler to ensure that only one timer is active at a time among all the nginx workers: https://github.com/agentzh/lua-resty-lock
You can use ngx.timer.every API. Using ngx.timer.every API is recommended over using recursive ngx.timer.at API.
I have a php script which does the accepted answer described here.
It doesn't work unless I add the following before fclose($fp)
while (!feof($fp)) {
$httpResponse .= fgets($fp, 128);
}
Even a blank for loop would do the job instead of the above!!
But whats the point? I wanted Async calls :(
To add to my pain, the same code is running fine without the above code snippet in an Apache driven environment.
Anybody knows if Nginx or php-fpm having a problem with such requests?
What you're looking for can only be done on Linux flavor systems with a PHP build that includes the Process Control functions (PCNTL library).
You'll find it's documentation here:
http://php.net/manual/en/book.pcntl.php
Specifically what you want to do is "fork" a process. This creates an identical copy of the current PHP script's process including all memory references and then allows both scripts to continue executing simultaneously.
The "parent" script is aware that it is still the primary script. And the "child" script (or scripts, you can do this as many times as you want) is aware that is is a child. This allows you to choose a different action for the parent and the child once the child is spun off and turned into a daemon.
To do this, you'd use something along these lines:
$pid = pcntl_fork(); //store the process ID of the child when the script forks
if ($pid == -1) {
die('could not fork'); // -1 return value means the process could not fork properly
} else if ($pid) {
// a process ID will only be set in the parent script. this is the main script that can output to the user's browser
} else {
// this is the child script executing. Any output from this script will NOT reach the user's browser
}
That will enable a script to spin off a child process that can continue executing along side (or long after) the parent script outputs it's content and exits.
You should keep in mind that these functions must be compiled into your PHP build and that the vast majority of hosting companies will not allow access to them on their servers. In order to use these functions, you generally will need to have a Virtual Private Server (VPS) or a Dedicated server. Not even cloud hosting setups will usually offer these functions as if used incorrectly (or maliciously) they can easily bring a server to it's knees.