What is the best usage of tsdb? - opentsdb

I found opentsdb as a powerful monitoring system. it has a structure like proc.loadavg.1min 1234567890 1.35 host=A.
But my questions are:
1- is it good for logging in php?
2- can I store every log data in it?
3- and please let me know if there is a good library for php for working with opnetsdb for e.g ( send data to opentsdb by php )
it not yet clear to me.
I would be tankful for any help.

In my opinion openTSDB is not a monitoring system but a way to store time series.
If you want to build a monitoring tool you'll need a bigger set of tools, including a way to feed your monitored metrics to the database and a way to display them.
For example you can use Logstash and statsd to collect, aggregate and send your metrics. For the display you can use a tool called Grafana.
openTSDB is just an options for storing it, but you can also Graphite or InfluxDB.

Related

How to migrate data from OpenTSDB to TDengine?

I'm trying to migrate history data used to be stored in OpenTSDB to TDengine as I'm struggling with setting up multiple components like TSD, zookeeper etc. Does TDengine provide any ways to serve this purpose?
There is a tool named taosAdapter.
It is compatible with OpenTSDB JSON and Telnet format writing.
For historical data migration, you can try DataX.

how can I calculate how much storage existing kudu table actually uses

I would like to calculate how big (in GB) existing kudu table actually it.
Does anybody know how to do this ?
Only way that I'm aware of to do this is to collect said metrics over http against the individual tablet servers.
Example call: curl -s 'http://tserver.host:8050/metrics?include_schema=1&metrics=on_disk_size'
You can pull those metrics and work from there.

Zenoss Graph results of postgresql database query

So I just added my servers to Zenoss and installed some postgresql zenpacks. Unfortunatly I do not care for most of the postgresql monitoring tools. Instead of what they gave me, I am wondering if it is possible to send a custom query that I wrote, then graph the result using Zenoss? How do I go about doing this? Are there any good resources that you know of?
Thanks.
You will need to write your own ZenPack (or at least Template). Check Development Guide http://wiki.zenoss.org/ZenPack_Development_Guide or http://zenosslabs.readthedocs.org/en/latest/zenpack_development/
IMHO you will need zencommand datasource, which will execute your custom SQL query and this query output (number only) will be metric value, which will be processed by Zenoss.
Or you can expose metric(s) via SNMP and then it'll be only standard SNMP metric in Zenoss.
It's up to you how do you implement it. I recommend you to use community forum http://www.zenoss.org/forum for Zenoss related questions.

Sending Data from ganglia to graphite

I am currently collecting monitoring metrics with Ganglia and I would like to show graphs with that data with Graphite. I know such an integration is possible, and I found an article describing how it should be done. I am not quite sure exactly how this integration works, especially when I want to send it straight into graphite without parsing the data of the gmetad. Any help on how to integrate Ganglia with Graphite will be great.
thanks
There are two approaches to integrate ganglia with graphite.
use third party process to get metrics from gmetad/gmond, tweak metrics data format, send metrics data to carbon server finally.
use the feature "graphite integration" of gmetad where you just need to configure the carbon server address, port, protocol (with an optional graphite path syntax), then gmetad will do all the things left. The more details can be found from your /etc/ganglia/gmetad.conf
I would recommend #2 since it's pretty simple. you just need to upgrade your ganglia packages to version 3.3+.
In above solutions, you can store metrics data in both RRD and whisper. If you don't want this approach, it also supports altering rrdtool graphs with graphite in ganglia-web. see "Using graphite as graphing engine"
Have you checked the ganglia-web wiki ? There is a section Graphite Integration and an other called Using Graphite as the graphing engine which explain well how to do what you want.
I've worked a lot with Ganglia, Graphite from what I've researched works similarly. I was never able to master Whisper, but I've found RRD's (round robin databases) to be pretty reliable. Not sure what you're interested in monitoring, but I would definitely check out JMXtrans. You can get the code from Google. It provides multiple methods for extracting metric data from whatever JVM you're monitoring, and lets you define which metrics you'd like to pipe to Ganglia/Graphite, and some other options.

Recommendations using R with SimpleDB or BigQuery or using PHP with SimpleDB

I am currently working on system that generated product recommendations like those on Amazon : "People who bought this also bought this.."
Current Scenario:
Extract the Google Analytics data of the client and insert it in database.
On the website of the client, on load of product page the API call is made to get the recommendations of the product being viewed.
When API receives the product ID as request it looks in the database and retrieves (using association rules) the recommended product IDs and sends them as response.
The list of these product Ids will be processed to get the product details(image,price..) at the client end and displayed on website.
Currently I am using PHP and MYSQL with gapi package and REST api
storage on AMAZON EC2 .
My Question is:
Now, if I have to choose amongst the following, which will be the best choice to implement the above mentioned concept.
PHP with SimpleDB or BIGQuery.
R language with BIGQuery.
RHIPE-(R and hadoop ) with SimpleDB.
Apache Mahout.
Plese help!
This isn't so easy to answer, because the constraints are fairly specialized.
The following considerations can be made, though:
BIGQuery is not yet public. Thus, with a small usage base, even if you are in the preview population, it will be harder to get advice on improvement.
Each of your answers asked about a modeling system & a storage system. Apache Mahout is not a storage mechanism, so it won't necessarily work on its own. I used to believe that its machine learning implementations were a a pastiche of a few Google Summer of Code, but I've updated that view on the suggestion of a commenter. It still looks like it has rather uneven and spotty coverage of different algorithms, and it's not particularly clear how the components are supported or maintained. I encourage an evangelist for Mahout to address this.
As a result, this eliminates the 1st, 2nd, and 4th options.
What I don't quite get is the need for a real-time server to utilize Hadoop and RHIPE. That should be done in your batch processing for developing the recommendation models, not in real-time. I suppose you could use RHIPE as a simple one-stop front end for firing off queries.
I'd recommend using RApache instead of RHIPE, because you can get your packages and models pre-loaded. I see no advantage to using Hadoop in the front end, but it would be a very natural back end system for the model fitting.
(Update 1) Other interface options include RServe (http://www.rforge.net/Rserve/) and possibly RStudio in server mode. There are R/PHP interfaces (see comments below), but I suspect it would be better to access R through HTTP or TCP/IP.
(Update 2) Addressing the whole process, the basic idea I see is that you could query the data from PHP and pass to R or, if you wish to query from within R, look at the link in the comments (to the OmegaHat tools) or post a new question about R & SimpleDB - I'm sure someone else on SO would be able to give better insight on this particular connection. RApache would let you instantiate many R processes already prepared with packages loaded and data in RAM; thus you would only need to pass whatever data needs to be used for prediction. If your new data is a small vector then RApache should be fine, and it seems this is correct for the data being processed in real-time.
If you want a real-time API for recommendations based on data in a database, Apache Mahout does this directly. You want to use ReloadFromJDBCDataModel, put on top a GenericItemBasedRecommender, and use the servlet-based wrapper in the examples module. It's probably a day or two of work to get familiar with the code and customize it to your needs, but it's pretty simple.
When you get past about 100M data points you would need to look at distributing the computation Hadoop. That's a fair bit more complex. Mahout has a distributed recommender too which you can customize.

Resources