marklogic 8 :Schedule Xquery extraction - xquery

I'm currently using Xquery queries (launch via the API) to extract from Marklogic 8.0.6/
Query in my file extract_data.xqy:
xdmp:save("toto.csv",let $nl := "
"
return
document {
for $data in collection("http://book/polar")
return ($data)
})
API call :
$curl --anyauth --user ${MARKLOGIC_USERNAME}:${MARKLOGIC_PASSWORD} -X POST -i -d #extract_data.xqy \
-H "Content-type: application/x-www-form-urlencoded" \
-H "Accept: multipart/mixed; boundary=BOUNDARY" \
$node:$port/v1/eval?database=$db_name
It works fine but I'd like to schedule this extract directly in marklogic and have it running in background to avoid timeout if the request takes too much time to be executed.
Is-there a feature like that to do that ?
Regards,
Romain.

You can use the task scheduler to setup recurring script execution.
The timeout can be adjusted in the script with xdmp:set-request-time-limit
I would suggest you take a look at MLCP as well.

As suggested by Mads a tool like CORB can help pull csv data out of MarkLogic.
A schedule as suggested by Michael can trigger a periodic export, and save the output to disk, or push it elsewhere via HTTP. I'd look into running incremental exports in that case though, and I'd also suggest batching things up. In a large cluster, I'd even suggest chunking the export into batches per forest or per host on which content forests are attached. Scheduled tasks allow targeting specific hosts on which they should run.
You can also run an adhoc export, particularly if you batch up the work using a tool like taskbot. And if you combine it with its OPTIONS-SYNC-UPDATE mode, you can merge multiple batches back into one result file before emitting it as well, and get better performance out of it, compared to running it single threaded. Merging results doesn't scale endlessly, but if you have a relatively small dataset (only a few million small records maybe), that might be sufficient.
HTH!

Related

Track failure of command on a Salt minion

I am using salt for last one month. Whenever I run a command say sudo salt '*' test.ping, then the master pings all the minions and the response being the list of all the minions which are up and running. Output looks something like:
{
"minion_1": true
}
{
"minion_2": true
}
{
"minion_3": true
}
In the master's conf file, return type is configured to JSON.
But if I execute an incorrect command through salt master say sudo salt '*' test1.ping, then the master returns something like this
{
"minion_1": "'test1.ping' is not available."
}
{
"minion_2": "'test1.ping' is not available."
}
{
"minion_3": "'test1.ping' is not available."
}
In both the outputs displayed above, the command has given a success exit code on the master's shell/terminal. How do we track which minions were not able to execute the commands. I am not interested in what type of error it is, I just need some or the other way to track the minions which failed to execute the command.
The last solution is to write a parser which will read the complete output and decide for itself. Hope that there is a better solution.
Reasons to despair
I would not rely on Salt's CLI exit code at the moment (version 2014.7.5) - there are still many issues opened to solve this.
Get valid JSON output
There is --static option which fixes JSON output:
If using --out=json, you will probably want --static as well. Without the static option, you will get a JSON string for each minion.
Otherwise the output given by Salt above contains multiple objects (one per minion) which is not a valid JSON (JSON requires single object, array or value per document) and simple way of loading entire output by a standard JSON parser will fail. It is even mentioned in documentation (as of 5188d6c):
Some JSON parsers can guess when an object ends and a new one begins but many can not.
In addition to that, some Salt options (like show_jid) also send strings to STDOUT which mixes it with execution report and invalidates JSON output format. Option --static also solves this problem.
UPDATE: Parser to detect failure in Salt execution
This problem squeezed me so much so I gave quick birth to this Python script # 75e42af with example how it is used # b819961d.
NOTE: This won't address output of arbitrary Salt command (including test.ping above), but issues related to the output of state execution are covered. There is still a solution to test.ping problem above - it can be run from state, then the output can be analysed by the script. See how to call an execution module from within a state or *.sls file in this answer.
Features (details in the code itself):
Handle output from both highstate and orchestrate runners.
Handle output of multiple minions and any number of commands.
Report summary "? of N" and overall result.
Standalone file usable as script and module.
The only limitation is that it requires JSON output (Salt option --out json) simply because it is easy to fix the discussed issues before feeding it to parser.
The above parser will only work for test.ping command.
If multiple commands have to be executed we will have to write a more robust parser.

Monitoring SaltStack

Is there anything out there to monitor SaltStack installations besides halite? I have it installed but it's not really what we are looking for.
It would be nice if we could have a web gui or even a daily email that showed the status of all the minions. I'm pretty handy with scripting but I don't know what to script.
Anybody have any ideas?
In case by monitoring you mean operating salt, you can try one of the following:
SaltStack Enterprise GUI
Foreman
SaltPad
Molten
Halite (DEPRECATED by SaltStack)
These GUI will allow you more than just knowing whether or not minions are alive. They will allow you to operate on them in the same manner you could with the salt client.
And in case by monitoring you mean just whether the salt master and salt minions are up and running, you can use a general-purpose monitoring solutions like:
Icinga
Naemon
Nagios
Shinken
Sensu
In fact, these tools can monitor different services on the hosts they know about. The host can be any machine that has an ip address and the service can be any resource that can be queried via the underlying OS. Example of host can be a server, router, printer... And example of service can be memory, disk, a process, ...
Not an absolute answer, but we're developing saltpad, which is a replacement and improvement of halite. One of its feature is to display the status of all your minions. You can give it a try: Saltpad Project page on Github
You might look into consul while it isn't specifically for SaltStack, I use it to monitor that salt-master and salt-minion are running on the hosts they should be.
Another simple test would be to run something like:
salt --output=json '*' test.ping
And compare between different runs. It's not amazing monitoring, but at least shows your minions are up and communicating with your master.
Another option might be to use the salt.runners.manage functions, which comes with a status function.
In order to print the status of all known salt minions you can run this on your salt master:
salt-run manage.status
salt-run manage.status tgt="webservers" expr_form="nodegroup"
I had to write my own. To my knowledge, there is nothing out there which will do this, and halite didn't work for what I needed.
If you know Python, it's fairly easy to write an application to monitor salt. For example, my app had a thread which refreshed the list of hosts from the salt keys from time to time, and a few threads that ran various commands against that list to verify they were up. The monitor threads updated a dictionary with a timestamp and success/fail for each host after they ran. It had a hacked together HTML display color coded to reflect the status of each node. Took me a about half a day to write it.
If you don't want to use Python, you could, painfully, do something similar to this inefficient, quick, untested hack using command line tools in bash.
minion_list=$(salt-key --out=txt|grep '^minions_pre:.*'|tr ',' ' ') # You'
for minion in ${minion_list}; do
salt "${minion}" test.ping
if [ $? -ne 0 ]; then
echo "${minion} is down."
fi
done
It would be easy enough to modify to write file or send an alert.
halite was depreciated in favour of paid ui version, sad, but true - still saltstack does the job. I'd just guess your best monitoring will be the one you can write yourself, happily there's a salt-api project (which I believe was part of halite, not sure about this), I'd recommend you to use this one with tornado as it's better than cherry version.
So if you want nice interface you might want to work with api once you set it up... when setting up tornado make sure you're ok with authentication (i had some trouble in here), here's how you can check it:
Using Postman/Curl/whatever:
check if api is alive:
- no post data (just see if api is alive)
- get request http://masterip:8000/
login (you'll need to take token returned from here to do most operations):
- post to http://masterip:8000/login
- (x-www-form-urlencoded data in postman), raw:
username:yourUsername
password:yourPassword
eauth:pam
im using pam so I have a user with yourUsername and yourPassword added on my master server (as a regular user, that's how pam's working)
get minions, http://masterip:8000/minions (you'll need to post token from login operation),
get all jobs, http://masterip:8000/jobs (you'll n need to post token from login operation),
So basically if you want to do anything with saltstack monitoring just play with that salt-api & get what you want, saltstack has output formatters so you could get all data even as a json (if your frontend is javascript like) - it lets you run cmd's or whatever you want and the monitoring is left to you (unless you switch from the community to pro versions) or unless you want to use mentioned saltpad (which, sorry guys, have been last updated a year ago according to repo).
btw. you might need to change that 8000 port to something else depending on version of saltstack/tornado/config.
Basically if you want to have an output where you can check the status of all the minions then you can run a command like
salt '*' test.ping
salt --output=json '*' test.ping #To get output in Json Format
salt manage.up # Returns all minions status
Or else if you want to visualize the same with a Dashboard then you can see some of the available options like Foreman, SaltPad etc.

Ceilometer api to use and its result parmeters

I was trying to fetch the resource and usage of resources of an instance using ceilometer api. I have used v2/meters/instance , v2/meters/cpu_util and v2/meters/memory. The result these api's return is too large and I'm not able to identify the paramater that needs to be taken to find the resource usage. I need to find the cpu utilization, bandwidth and memory usage of an instance using the ceilometer api. Can anyone please tell me which api I need to use to get the cpu utilization, bandwidth and memory usage of an instance and the parameter that needs to be taken to get the usage.
Thanks for any help in advance.
Regards ,
Lokesh.S
If you use the CLI, you can limit the number of samples with the -l/--limit parameter like in the example below:
`ceilometer sample-list -m cpu_util -l 10`
ceilometer --debug sample-list -m cpu_util -l 1 -q resource={your_vm_id}
Note that
--debug enables you to observe what rest API has been requested, you can learn example from it and write your own rest request, or just use CLI if you can. And this option will show rest response with full detailed sample information, CLI will format it and some information may be dropped.
-l 1 means just limit to return one result, so you will not flushed by huge mount of data
-q resource={your_vm_id} means only get cpu_util sample for that vm
you can read this official document of http://docs.openstack.org/developer/ceilometer/webapi/v2.html, or read my post http://zqfan.github.io/assets/doc/ceilometer-havana-api-v2.html (which is written in Chinese)

Apigee proxy performance tuning

Is there a recommended/best way to generate a report for time spent on each policy for an API proxy?
Currently my approach is to use JS to collect the timestamps and calculate the delay around each policy, and then report it using the stats collection policy.
That's too invasive for performance checks and my data collection alone adds time to the overall response.
What would be the best no invasive way to report on the time taken for each step when analyzing the data across many requests (the ui, in the trace mode does show the time for each policy on an individual request basis)
Thanks,
Ricardo
There's not a public API supported to calculate this information and return a nice, clean response of aggregated policy execution time data. Best bet is to try using Analytics reports with request_processing_latency and response_processing_latency measures. (http://apigee.com/docs/content/analytics-reference). Then, if needed, utilize trace to identify policy execution times.
Alternatively, you can try downloading the trace session and parsing the timestamps between policies to build your information, but trace in UI does this already..
You consider using debug API. http://apigee.com/docs/api/debug-sessions
First you'll need to start a session for example:
curl -H "Content-type:application/octet-stream" -X POST https://api.enterprise.apigee.com/v1/organizations/{org}/environments/{env}/apis/{api_name}/revisions/{revision #}/debugsessions?"session=MySession" \
-u $ae_username:$ae_password
Get info from session:
curl -X GET -H "Accept:application/json" \
https://api.enterprise.apigee.com/v1/organizations/{org}/environments/{env}/apis/{api_name}/revisions/{revision #}/debugsessions/MySession/data \
-u $ae_username:$ae_password
The time spent for each policy can be found using the debug trace in UI.
Please see the below screenshot for the same.
Also as Diego said you can use debugsession API call to get a debug session.
For the debug session you can also define the time limit as to how much time you want to debug session to run. With this if you are running your performance test for 1 hour you can create debug session for that much amount of time.
curl -v -u jhans#apigee.com http://management:8080/v1/organizations/weatherapi/environments/prod/apis/ForeCast/revisions/6/debugsessions?session=ab\&timeout=300 -X POST
From the UI you can download the trace session which would contain an XML having the timestamp for each policy
<Point id="Condition">
<DebugInfo>
<Timestamp>05-02-14 04:38:14:088</Timestamp>
<Properties>
<Property name="ExpressionResult">true</Property>
</Point>
<Point id="StateChange">
The above is an example for getting timestamps for any policy in debug trace from UI.
Ricardo,
here is what I suggest.
Disclaimer: It is very meticulous and time consuming. I would recommend this approach only when you are really blocked on a performance issue and there is no other solution.
Let us say your proxy has few policies a service callout to external service and a backend.
So the total latency would be Sum of time taken by (p1, p2, p3...) + service callout target + time taken by your backend.
Very 1st step would be to stub out the external dependencies. You can use a null target (a stub proxy on Apigee edge without any logic) to do so.
Now disable all other policies ( enable = false on the policy schema ). Conduct a load test and benchmark your proxy performance against stubbed endpoints. Also at this time no policies are active.
Start activating the policies one by one or a few at a time, and re-run the load test each time.
Finally you can run the load test against real backends (removing the stubs)
At the end of this series of load test you will know which policy, backend is making the most significant performance impact.

Get File Creation Date Over HTTP

Given a file on a webserver (e.g., http://foo.com/bar.zip -> only accessible through HTTP), is there any way to get the date attributes (e.g., date [created, modified]) without downloading the entire archive in the first place?
Right now, I download the archive and read the attributes programmatically. Trouble is that the archive is dozens of MiB so it seems like a waste of resources to download the entire thing and end up reading off just a couple of bytes of information.
I realize that bandwidth is practically free, but I don't like to be wasteful in any case.
Try to read Last-Modified from header
Be sure to use a HTTP HEAD request instead of a HTTP GET request to read the HTTP headers only. If you do a HTTP GET, you will download the whole file nevertheless, even if you decide just to inspect the HTTP headers.
Just for the sake of simplicity, here's a compilation of the existing (perfect) answers from #ihorko and #JanThomä, that uses curl. Other option are available too, of course, but here's a fully functional answer.
Use curl with the -I option:
-I, --head
(HTTP/FTP/FILE) Fetch the HTTP-header only! HTTP-servers feature the command HEAD which this uses to get nothing but the header of a document. When used on an FTP or FILE file, curl displays the file size and last modification time only.
Also, the -s option is nice here:
-s, --silent
Silent or quiet mode. Don't show progress meter or error messages. Makes Curl mute. It will still output the data you ask for, potentially even to the terminal/stdout unless you redirect it.
Hence, something like this would do the trick:
curl -sI http://foo.com/bar.zip | grep 'Last-Modified' | cut -d' ' -f 2-

Resources