Sending Data from ganglia to graphite - graphite

I am currently collecting monitoring metrics with Ganglia and I would like to show graphs with that data with Graphite. I know such an integration is possible, and I found an article describing how it should be done. I am not quite sure exactly how this integration works, especially when I want to send it straight into graphite without parsing the data of the gmetad. Any help on how to integrate Ganglia with Graphite will be great.
thanks

There are two approaches to integrate ganglia with graphite.
use third party process to get metrics from gmetad/gmond, tweak metrics data format, send metrics data to carbon server finally.
use the feature "graphite integration" of gmetad where you just need to configure the carbon server address, port, protocol (with an optional graphite path syntax), then gmetad will do all the things left. The more details can be found from your /etc/ganglia/gmetad.conf
I would recommend #2 since it's pretty simple. you just need to upgrade your ganglia packages to version 3.3+.
In above solutions, you can store metrics data in both RRD and whisper. If you don't want this approach, it also supports altering rrdtool graphs with graphite in ganglia-web. see "Using graphite as graphing engine"

Have you checked the ganglia-web wiki ? There is a section Graphite Integration and an other called Using Graphite as the graphing engine which explain well how to do what you want.

I've worked a lot with Ganglia, Graphite from what I've researched works similarly. I was never able to master Whisper, but I've found RRD's (round robin databases) to be pretty reliable. Not sure what you're interested in monitoring, but I would definitely check out JMXtrans. You can get the code from Google. It provides multiple methods for extracting metric data from whatever JVM you're monitoring, and lets you define which metrics you'd like to pipe to Ganglia/Graphite, and some other options.

Related

How to control the metrics sent by statsd to graphite?

I am using hosted graphite and statsd. I want to reduce the costs, by reducing the metrics been sent.
For example: for each timer, I don't want to send all 14 metrics (upper, std...), but only 3 of them (mean_90, sum, and maybe another one).
How can I configure that in the statsd configuration file?
Looking at the statsd docs and its source, I don't think you can configure it to not send all the metrics.
I suggest that you either:
Edit the source code to only calculate and send the metrics you want. This is probably easy to do just deleting the lines where they are calculated.
Configure Graphite to drop all metrics coming from statsd not matching the three patterns you want.

What is the best usage of tsdb?

I found opentsdb as a powerful monitoring system. it has a structure like proc.loadavg.1min 1234567890 1.35 host=A.
But my questions are:
1- is it good for logging in php?
2- can I store every log data in it?
3- and please let me know if there is a good library for php for working with opnetsdb for e.g ( send data to opentsdb by php )
it not yet clear to me.
I would be tankful for any help.
In my opinion openTSDB is not a monitoring system but a way to store time series.
If you want to build a monitoring tool you'll need a bigger set of tools, including a way to feed your monitored metrics to the database and a way to display them.
For example you can use Logstash and statsd to collect, aggregate and send your metrics. For the display you can use a tool called Grafana.
openTSDB is just an options for storing it, but you can also Graphite or InfluxDB.

How to build a predictive dialer?

I need to build a reliable predictive dialer based on Asterisk. Currently the system we use includes Wombat and Asterisk, and we do not find this solution usable as Wombat provides a poor API and it's impossible to use it without regular manual operations.
The system we want:
Can be used solely via API or direct database queries (adding lists to campaigns, updating lists, starting campaigns, stopping campaigns etc.) so that it can be completely integrated into an existing product
Is free, or paid for annually independent to the usage rate
Is considered stable
Should be able to handle tens of thousands of calls per day, if it matters
Use vicidial.org or hire freelancer to build new core with your needed api.
You can also check OSdial for this, it also developed using asterisk.
We have been working with a preview of the next version of Wombat, through the Early Access program, and Wombat has a complete configuration and reporting JSON API and you can deploy it "headless" in order to scale up to thousands of parallel lines. If you ask Loway they can likely get you access to the Early Access program.
BTW, Vicidial is great for agent-based outbound, but imposes quite a large penalty on the number of agents per server - you cannot reasonably use it to do telecasting at the scale we are looking for as it would require too many servers. Wombat is leaner and can drive over one thousands channel per server. YMMV.
This question would be better placed on a "hire-a-freelancer" site like oDesk ... if you need custom programing done, those are the sorts of places to go to get manpower.
Your specifications are well within what is possible with Asterisk. I'd strongly recommend looking at Vici Dial and OS Dial as others have suggested; out of the can, they are pretty good.
The hard part of any auto-dialer is not the dialer, oddly enough. It's the prediction algorithms, the answering machine detection algorithms and the agent UI. Those are what makes or breaks an auto-dialer application for a company.

Recommendations using R with SimpleDB or BigQuery or using PHP with SimpleDB

I am currently working on system that generated product recommendations like those on Amazon : "People who bought this also bought this.."
Current Scenario:
Extract the Google Analytics data of the client and insert it in database.
On the website of the client, on load of product page the API call is made to get the recommendations of the product being viewed.
When API receives the product ID as request it looks in the database and retrieves (using association rules) the recommended product IDs and sends them as response.
The list of these product Ids will be processed to get the product details(image,price..) at the client end and displayed on website.
Currently I am using PHP and MYSQL with gapi package and REST api
storage on AMAZON EC2 .
My Question is:
Now, if I have to choose amongst the following, which will be the best choice to implement the above mentioned concept.
PHP with SimpleDB or BIGQuery.
R language with BIGQuery.
RHIPE-(R and hadoop ) with SimpleDB.
Apache Mahout.
Plese help!
This isn't so easy to answer, because the constraints are fairly specialized.
The following considerations can be made, though:
BIGQuery is not yet public. Thus, with a small usage base, even if you are in the preview population, it will be harder to get advice on improvement.
Each of your answers asked about a modeling system & a storage system. Apache Mahout is not a storage mechanism, so it won't necessarily work on its own. I used to believe that its machine learning implementations were a a pastiche of a few Google Summer of Code, but I've updated that view on the suggestion of a commenter. It still looks like it has rather uneven and spotty coverage of different algorithms, and it's not particularly clear how the components are supported or maintained. I encourage an evangelist for Mahout to address this.
As a result, this eliminates the 1st, 2nd, and 4th options.
What I don't quite get is the need for a real-time server to utilize Hadoop and RHIPE. That should be done in your batch processing for developing the recommendation models, not in real-time. I suppose you could use RHIPE as a simple one-stop front end for firing off queries.
I'd recommend using RApache instead of RHIPE, because you can get your packages and models pre-loaded. I see no advantage to using Hadoop in the front end, but it would be a very natural back end system for the model fitting.
(Update 1) Other interface options include RServe (http://www.rforge.net/Rserve/) and possibly RStudio in server mode. There are R/PHP interfaces (see comments below), but I suspect it would be better to access R through HTTP or TCP/IP.
(Update 2) Addressing the whole process, the basic idea I see is that you could query the data from PHP and pass to R or, if you wish to query from within R, look at the link in the comments (to the OmegaHat tools) or post a new question about R & SimpleDB - I'm sure someone else on SO would be able to give better insight on this particular connection. RApache would let you instantiate many R processes already prepared with packages loaded and data in RAM; thus you would only need to pass whatever data needs to be used for prediction. If your new data is a small vector then RApache should be fine, and it seems this is correct for the data being processed in real-time.
If you want a real-time API for recommendations based on data in a database, Apache Mahout does this directly. You want to use ReloadFromJDBCDataModel, put on top a GenericItemBasedRecommender, and use the servlet-based wrapper in the examples module. It's probably a day or two of work to get familiar with the code and customize it to your needs, but it's pretty simple.
When you get past about 100M data points you would need to look at distributing the computation Hadoop. That's a fair bit more complex. Mahout has a distributed recommender too which you can customize.

Does anyone know how google analytics processes data?

Anyone have any idea or know of any articles that discusses how google analytics stores and processes the data that comes in from the urchin calls? Curious about the architecture.
thanks!
Their own docs on "How Data Is Calculated" give you a pretty good idea of what data they collect and how they calculate their metrics:
http://code.google.com/apis/analytics/docs/concepts/gaConceptsOverview.html#howDataIsCalculated
As you mentioned, these calculations are distributed across many machines using Google's homegrown architecture, which includes Map/Reduce:
http://en.wikipedia.org/wiki/MapReduce
I think analytics is totally closed. However, if you haven't read about Facebook's Scribe it is probably worth checking out. Also, an extreme case of scalable distributed, logging, and analyzing.
i don't know especially about analytics, but in general Google uses (ehm.. invented?) Map/Reduce.
There are several open source databases which support using Map/Reduce calls, e.g. CouchDb, which is a document-oriented database.
These types of application use Geolocation for determining the location of the user on base of the ip address. Additional information is found out via JavaScripts opjects window.navigator (useragent, platform, language, ...) and screen (dimensions, color depth)
edit:
there is evidence that google uses it's BigTable-DB-Engine (which corresponds to MapReduce) for reader, maps & youtube.
on dbms2.com, they even say that analytics uses MapReduce (could be categorized as "insider knowledge").

Resources