How to display the logged information on an aerospike server as a graph? - graphite

I would like to log some stats on some aerospike nodes and analyse the same.
I found that aerospike comes with this tool called asgraphite, which seems to be using a forked version of the
The integration guide of asgraphite mentions some commands which are supposed to, e.g., start logging. I can run the following command already on my node and see the expected output, so it looks like I am all set to start logging -
By the way, we are running the community edition, which, it seems, does not provide historical latency stats in the AMC dashboard.
python /opt/aerospike/bin/asgraphite --help
However, I don't see any information on how to monitor the data hence logged. I am expecting a web interface which is usually provided by graphite, like this -

Related

Can I instrument Firebase Functions with OpenTelemetry to analyse traces in Cloud Trace? If yes how?

I have many v1 cloud functions. Written in typescript and running on NodeJS 18. Deployed with "firebase deploy functions". Completely server-less.
If I open Cloud Trace ( https://console.cloud.google.com/traces ), I find all my functions, but there are no traces, just one single long bar (yes of course, I am not collecting them...)
Now, my question is, can I add tracing mechanism to my functions so to see detailed traces? How would I go about that?
I found this guide, https://cloud.google.com/trace/docs/setup/nodejs-ot but it is either for Compute Engine or Google Kubernetes Engine. And Firebase Functions are neither of these...
Also, I know that sometimes the function instance is disposed, some other time no, like google try to reuse resources (indeed the second call to one of my function, within a 1 minute, is waaaaaaaay faster than the initial call). In that case, wouldn't this "recycling" be an issue for the sake of traces?
Edit: what I did (thanks to Doug for your comment).
I haven't set up anything at all for tracing, my question is to know if it is possible and how to do it.
The only documentation I found, as said before is ( https://cloud.google.com/trace/docs/setup/nodejs-ot ) and to me it seems that said doc does not apply to my case (I am not on Compute Engine nor on Google Kubernetes Engine) I have searched also for the documentation mentioned by Doug https://cloud.google.com/trace but I could not find a practical step by step implementation guide.
So If you want to reproduce my behaviour
firebase init a project
then write a function definition
deploy it with firebase deploy --only functions
go to trace on google cloud platform: https://console.cloud.google.com/traces
observe that the function has much as it is complex, calls different subfunctions, and makes multiple other http requests etc. has no detailed trace, and just one single span as depicted in the picture above

bokeh sample data download fail with 'HTTPError: HTTP Error 403: Forbidden'

When trying to download bokeh sample data following instructions in 'https://docs.bokeh.org/en/latest/docs/installation.html#sample-data' it fails to download with HTTP Error 403: Forbidden
in conda prompt:
bokeh sampledata (failed)
in Jupyter notebook
import bokeh.sampledata
bokeh.sampledata.download() (failed
TLDR; you will either need to upgrade to Bokeh version 1.3 or later, or else you can manually edit the bokeh.util.sampledata module to use the new CDN location http://sampledata.bokeh.org. You can see the exact change to make in PR #9075
The bokeh.sampledata module originally pulled data directly from a public AWS S3 bucket location hardcoded in the module. This was a poor choice that left open the possibility for abuse, and in late 2019 an incident finally happened where someone (intentionally or unintentionally) downloaded the entire dataset tens of thousands of times over a three day period, incurring a significant monetary cost. (Fortunately, AWS saw fit to award us a credit to cover this anomaly.) Starting in version 1.3 sample data is now only accessed from a proper CDN with much better cost structures. All public direct access to the original S3 bucket was removed. This change had the unfortunate effect of immediately breaking bokeh.sampledata for all previous Bokeh versions, however as an open-source project we simply cannot afford the real (and potentially unlimited) financial risk exposure.

Monitoring SaltStack

Is there anything out there to monitor SaltStack installations besides halite? I have it installed but it's not really what we are looking for.
It would be nice if we could have a web gui or even a daily email that showed the status of all the minions. I'm pretty handy with scripting but I don't know what to script.
Anybody have any ideas?
In case by monitoring you mean operating salt, you can try one of the following:
SaltStack Enterprise GUI
Foreman
SaltPad
Molten
Halite (DEPRECATED by SaltStack)
These GUI will allow you more than just knowing whether or not minions are alive. They will allow you to operate on them in the same manner you could with the salt client.
And in case by monitoring you mean just whether the salt master and salt minions are up and running, you can use a general-purpose monitoring solutions like:
Icinga
Naemon
Nagios
Shinken
Sensu
In fact, these tools can monitor different services on the hosts they know about. The host can be any machine that has an ip address and the service can be any resource that can be queried via the underlying OS. Example of host can be a server, router, printer... And example of service can be memory, disk, a process, ...
Not an absolute answer, but we're developing saltpad, which is a replacement and improvement of halite. One of its feature is to display the status of all your minions. You can give it a try: Saltpad Project page on Github
You might look into consul while it isn't specifically for SaltStack, I use it to monitor that salt-master and salt-minion are running on the hosts they should be.
Another simple test would be to run something like:
salt --output=json '*' test.ping
And compare between different runs. It's not amazing monitoring, but at least shows your minions are up and communicating with your master.
Another option might be to use the salt.runners.manage functions, which comes with a status function.
In order to print the status of all known salt minions you can run this on your salt master:
salt-run manage.status
salt-run manage.status tgt="webservers" expr_form="nodegroup"
I had to write my own. To my knowledge, there is nothing out there which will do this, and halite didn't work for what I needed.
If you know Python, it's fairly easy to write an application to monitor salt. For example, my app had a thread which refreshed the list of hosts from the salt keys from time to time, and a few threads that ran various commands against that list to verify they were up. The monitor threads updated a dictionary with a timestamp and success/fail for each host after they ran. It had a hacked together HTML display color coded to reflect the status of each node. Took me a about half a day to write it.
If you don't want to use Python, you could, painfully, do something similar to this inefficient, quick, untested hack using command line tools in bash.
minion_list=$(salt-key --out=txt|grep '^minions_pre:.*'|tr ',' ' ') # You'
for minion in ${minion_list}; do
salt "${minion}" test.ping
if [ $? -ne 0 ]; then
echo "${minion} is down."
fi
done
It would be easy enough to modify to write file or send an alert.
halite was depreciated in favour of paid ui version, sad, but true - still saltstack does the job. I'd just guess your best monitoring will be the one you can write yourself, happily there's a salt-api project (which I believe was part of halite, not sure about this), I'd recommend you to use this one with tornado as it's better than cherry version.
So if you want nice interface you might want to work with api once you set it up... when setting up tornado make sure you're ok with authentication (i had some trouble in here), here's how you can check it:
Using Postman/Curl/whatever:
check if api is alive:
- no post data (just see if api is alive)
- get request http://masterip:8000/
login (you'll need to take token returned from here to do most operations):
- post to http://masterip:8000/login
- (x-www-form-urlencoded data in postman), raw:
username:yourUsername
password:yourPassword
eauth:pam
im using pam so I have a user with yourUsername and yourPassword added on my master server (as a regular user, that's how pam's working)
get minions, http://masterip:8000/minions (you'll need to post token from login operation),
get all jobs, http://masterip:8000/jobs (you'll n need to post token from login operation),
So basically if you want to do anything with saltstack monitoring just play with that salt-api & get what you want, saltstack has output formatters so you could get all data even as a json (if your frontend is javascript like) - it lets you run cmd's or whatever you want and the monitoring is left to you (unless you switch from the community to pro versions) or unless you want to use mentioned saltpad (which, sorry guys, have been last updated a year ago according to repo).
btw. you might need to change that 8000 port to something else depending on version of saltstack/tornado/config.
Basically if you want to have an output where you can check the status of all the minions then you can run a command like
salt '*' test.ping
salt --output=json '*' test.ping #To get output in Json Format
salt manage.up # Returns all minions status
Or else if you want to visualize the same with a Dashboard then you can see some of the available options like Foreman, SaltPad etc.

Indefinite Provisioning of EMR Cluster with Segue in R

I am trying to use the R package called Segue by JD Long, which is lauded as the ultimate in simplicity for using R with AWS by a book I read called "Parallel R".
However, for the 2nd day in a row I've run into a problem where I initiate the creation of a cluster and it just says STARTING indefinitely.
I tried this on OS X and in Linux with clusters of sizes 2, 6, 10, 20, and 25. I let them all run for at least 6 hours. I have no problem starting a cluster in the AWS EMR Management Console, though I have no clue how to connect Segue/R to a cluster that was started in the Management Console instead of via createCluster().
So my question is -- is there either some way to trouble shoot the provisioning of the cluster or to bypass the problem by creating the cluster manually and somehow getting Segue to work with that?
Here's an example of what I'm seeing:
library(segue)
Loading required package: rJava
Loading required package: caTools
Segue did not find your AWS credentials. Please run the setCredentials() function.
setCredentials("xxx", "xxx")
emr.handle <- createCluster(numInstances=10)
STARTING - 2013-07-12 10:36:44
STARTING - 2013-07-12 10:37:15
STARTING - 2013-07-12 10:37:46
STARTING - 2013-07-12 10:38:17
.... this goes on for hours and hours and hours...
UPDATE##: After 36 hours and many attempts that failed, this began working (randomly...) when I tried it with 1 node. I then tried it with 10 nodes and it worked great. To my knowledge nothing changed locally or on AWS...
I am answering my own question on behalf of the AWS support rep who gave me the following belated explanation:
The problem with the EMR creation is with the Availability Zone specified (us-east-1c), this availability zone is now constrained and doesn't allow the creation of new instances, so the job was trying to create the instances in a infinite loop.
You can see information about constrained AZ here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-regions-availability-zones
"As Availability Zones grow over time, our ability to expand them can become constrained. If this happens, we might restrict you from launching an instance in a constrained Availability Zone unless you already have an instance in that Availability Zone. Eventually, we might also remove the constrained Availability Zone from the list of Availability Zones for new customers. Therefore, your account might have a different number of available Availability Zones in a region than another account."
So you need to specify another AZ, or what I recommend is not specify any AZ, so EMR is going to be able to select any available.
I found this thread: https://groups.google.com/forum/#!topic/segue-r/GBd15jsFXkY
on Google Groups, where the topic of availability zones came up before. The zone that was set as the new default in that thread was the zone causing problems for me. I am attempting to edit the source of Segue.
Jason, I'm the author of Segue so maybe I can help.
Please look under the details section in the lower part of the AWS console and see if you can determine if the bootstrap sequences completed. This is an odd problem because typically an error at this stage is pervasive across all users. However I can't reproduce this one.

have R halt the EC2 machine it's running on

I have a few work flows where I would like R to halt the Linux machine it's running on after completion of a script. I can think of two similar ways to do this:
run R as root and then call system("halt")
run R from a root shell script (could run the R script as any user) then have the shell script run halt after the R bit completes.
Are there other easy ways of doing this?
The use case here is for scripts running on AWS where I would like the instance to stop after script completion so that I don't get charged for machine time post job run. My instance I use for data analysis is an EBS backed instance so I don't want to terminate it, simply suspend. Issuing a halt command from inside the instance is the same effect as a stop/suspend from AWS console.
I'm impressed that works. (For anyone else surprised that an instance can stop itself, see notes 1 & 2.)
You can also try "sudo halt", as you wouldn't need to run as a root user, as long as the user account running R is capable of running sudo. This is pretty common on a lot of AMIs on EC2.
Be careful about what constitutes an assumption of R quitting - believe it or not, one can crash R. It may be better to have a separate script that watches the R pid and, once that PID is no longer active, terminates the instance. Doing this command inside of R means that if R crashes, it never reaches the call to halt. If you call it from within another script, that can be dangerous, too. If you know Linux well, what you're looking for is the PID from starting R, which you can pass to another script that checks ps, say every 1 second, and then terminates the instance once the PID is no longer running.
I think a better solution is to use the EC2 API tools (see: http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/ for documentation) to terminate OR stop instances. There's a difference between the two of these, and it matters if your instance is EBS backed or S3 backed. You needn't run as root in order to terminate the instance - the fact that you have the private key and certificate shows Amazon that you're the BOSS, way above the hoi polloi who merely have root access on your instance.
Because these credentials can be used for mischief, be careful about running API tools from a given server, you'll need your certificate and private key on the server. That's a bad idea in the event that you have a security problem. It would be better to message to a master server and have it shut down the instance. If you have messaging set up in any way between instances, this can do all the work for you.
Note 1: Eric Hammond reports that the halt will only suspend an EBS instance, so you still have storage fees. If you happen to start a lot of such instances, this can clutter things up. Your original question seems unclear about whether you mean to terminate or stop an instance. He has other good advice on this page
Note 2: A short thread on the EC2 developers forum gives advice for Linux & Windows users.
Note 3: EBS instances are billed for partial hours, even when restarted. (See this thread from the developer forum.) Having an auto-suspend close to the hour mark can be useful, assuming the R process isn't working, in case one might re-task that instance (i.e. to save on not restarting). Other useful tools to consider: setTimeLimit and setSessionTimeLimit, and various checkpointing tools (I have a Q that mentions a couple). Using an auto-kill is useful if one has potentially badly behaved code.
Note 4: I recently learned of the shutdown command in package fun. This is multi-platform. See this blog post for commentary, and code is here. Dangerous stuff, but it could be useful if you want to adapt to Windows. I haven't tried it, though.
Update 1. Three more ideas:
You could use .Last() and runLast = TRUE for q() and quit(), which could shut down the instance.
If using littler or a script that invokes the script via Rscript, the same command line functions could be used.
My favorite package of today, tcltk2 has a neat timer mechanism, called tclTaskSchedule() that can be used to schedule the execution of an expression. You could then go crazy with the execution of stuff just before a hourly interval has elapsed.
system("echo 'rootpassword' | sudo halt")
However, the downside is having your root password in plain text in the script.
AFAIK those ways you mentioned are the only ones. In any case the script will have to run as root to be able to shut down the machine (if you find a way to do it without root that's possibly an exploit). You ask for an easier way but system("halt") is just an additional line at the end of your script.
sudo is an option -- it allows you to run certain commands without prompting for any password. Just put something like this in /etc/sudoers
<username> ALL=(ALL) PASSWD: ALL, NOPASSWD: /sbin/halt
(of course replacing with the name of user running R) and system('sudo halt') should just work.

Resources