Is there anyway to "ETL" data out of Graphite DB (Grafana) - graphite

We are trying to move some of our monitoring away from Grafana / Graphite DB into another system. Is there anyway to pull the full db data into a sql db?

You may use tools delivered from Whisper DB to extract data from files, i.e. and upload it into your DB.
[boss#DPU101 gn_bytes_total_card0]$ whisper-fetch gauge.wsp | head
1499842740 51993482526.000000
1499842800 51014501995.000000
1499842860 51011637567.000000
1499842920 51301789613.000000
1499842980 50994189020.000000
1499843040 50986821344.000000
This tool also allows you to extract data in JSON:
$ whisper-fetch --help
[...]
--json Output results in JSON form

You can use whisper utilities provided. You need to download it separately using following command.
on Ubuntu 14.04
apt-get install python-whisper
whisper-fetch.py program will allow you to download data into json format (or pretty format - separated by tab).
The data points will be for every 60 seconds.
Whisper Link

Related

Efficient method to move data from Oracle (SQL Developer) to MS SQL Server

Daily, I query a few tables in SQL Developer, filtering to prior day activity, adding column to date stamp the data, then export to xlsx. Then I manually import each file to a MS SQL Server via SQL Server Import and Export Wizard. Takes many clicks, much waiting...
I'm essentially creating an archive in SQL Server, the application I'm querying overwrites data daily. I'm not a DBA of either database, I use the archived data to do validations and research.
It's tough to get my org to provide additional software, I've been trying to make this work via SQL Developer, SSMS Express ed, and other standard tools.
I'm looking to make this reasonably automated, either via scripts, scheduled tasks, etc. Appreciate suggestions that would work on my current situation, but if that isn't reasonable, and there's a very reasonable alternative, I can go back to the org to request software/access/assistance.
You can use SSIS to import the data directly from Oracle to SQL Server, unless you need the .xlsx files for another purpose. You can also export from Oracle to these, then load to SQL Server from these files if you do need the files. For the date stamp column, a Derived Column can be added within a Data Flow Task using the SSIS GETDATE() function for a timestamp in order to achieve the same result. This function returns a timestamp, and if only the date is necessary the (DT_DBDATE) function can cast it to a date data type that's compatible with this data type of SQL Server. Once you have the SSIS package configured, you can schedule in to run at regular intervals as a SQL Agent job. I'd also recommend installing the SSIS catalog (SSISDB) and using this the source to run the packages from. The following links shed more light on these areas.
SSIS
Connecting to Oracle from SSIS
Data Flow Task
Derived Column Transformation
Creating SQL Server Agent Jobs for SSIS packages
SSIS Catalog
Another option that you may consider (if it is supported in SQL Express) is using the BCP utility, which can be run from command line.
The BCP utility allows you to bulk copy the data from a delimited text file into a SQL Server table.
If you go this approach, things to consider:
Number of Columns in the source file need to match the number of columns in the destination
Data types must match (or be comparable)
Typically, empty strings will be converted to nulls, so you will need to consider if the columns are nullable.
(to name a few - if you want to delve deeper, you might also need to look at custom delimiters between fields and records. Don't forget, commas and line feeds are still valid characters in char type fields).
Anyhow, maybe it will work for you, maybe not. Sure, you might still have to deal with the exporting of the data from Oracle, but it might ease the pain getting the data in.
Have a read:
https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-2017

download data from Graphite using Python

I am a beginner so please be kind. I want to download CPU utilization rate from from some VMs installed on a server. The server has Graphite installed. I installed the Python graphite-api and I have the server connection details. How do I make the REST api call to start pulling the data ?
Use the requests package:
>>> r = requests.get('https://your_graphite_host.com/render?target=app.numUsers&format=json', auth=('user', 'pass'))
>>> r.json() # this will give your the JSON file with the data
Keep in mind that you will have to replace app.numUsers with the appropriate metric name. You can also request other formats and time ranges, see the graphite-api docs.

Is it possible to access elasticsearch's internal stats via Kibana

I can see from querying our elasticsearch nodes that they contains internal statistics that for example show disk, memory and CPU usage (for example via GET _nodes/stats API).
Is there anyway to access these in Kibana-4?
Not directly, as ElasticSearch doesn't natively push it's internal statistics to an index. However you could easily set something like this up on a *nix box:
Poll your ElasticSearch box via REST periodically (say, once a minute). The /_status or /_cluster/health end points probably contain what you're after.
Pipe these to a log file in a simple CSV format along with a time stamp.
Point logstash to these log files and forward the output to your ElasticSearch box.
Graph your data.

MongoDB old Databases & collections not accessible - Mongod.exe and Mongo.exe running

I'm a MongoDB beginner.
I'm working on Intellijidea IDE to develop JAVA program in the aim to execute data mining processes on social media like Twitter and Facebook, based on Twitter4j and Facebook4j.
I use MongoDB to store database collections for test and evaluation purposes. I have saved several MongoDB databases, which were all accessible until a few days, in a folder as E:/data/db. So, all my previous databases are accessible in E:/data/db and I can easily control the structure of the databases collections through a terminal command in windows shell (show dbs(), show collections(), db stats()).
Last week, I launched a new data mining database collection, with several collections, and probably made a mistake in the localization of the database on my computer, where I put the new database in E:/data/db/newdatabase.
The problem is that I need to get the data mining process running while I would like to analyze the old databases collections through the R software.
Right now, I'm not able to access to the old MongoDB databases on Windows terminal command, as I can see only that there is some bytes, but no structured collections etc… When I'm trying to call the collections and the databases from R with rmongodb package, I'm not able to see the previous collections.
Might I be able to restore the old databases collections with Mongo restore or something like that ? What Kind of mistake could I have done to keep these old databases collections not accessible while there were a few days ago ?
MongoDB is not intended to be manipulated at the filesystem level.
Instead, you should be using mongoexport and mongoimport to transfer individual databases.
Check if you still have your collections data in your E:\data\db directory, named with your collection name
yourCollection.0
yourCollection.1
yourCollection.ns
Try by copy your newdatabase and db folder in a new folder (like E:\backup) then try to start two mongo instance :
mongod --dbpath=E:\backup\db --port 27001
mongod --dbpath=E:\backup\newdatabase --port 27002
try to connect to each dabase and check if eveything is ok (no data corruption, ...)
mongo --port 27001
mongo --port 27002
If it's ok, then, as jmkgreen explain, export your database, and import it in your previous database.

Connect R to POP Email Server (Gmail)

Is is possible to have R connect to gmail's POP server and read/download the messages in a specific folder of mine? I have been storing emails and would like to go back and start to analyze subject lines, etc.
Basically, I need a way to export a folder in my gmail account and I would like to do this pro grammatically if it all possible.
Thanks in advance!
I am not sure that this can be done via a single command. Maybe there is a package out there, which I am not aware of that can accomplish that, but as long as you do not run into that maybe the following process would be a solution ...
Consider got-your-back (http://code.google.com/p/got-your-back/wiki/GettingStarted#Step_4%3a_Performing_A_Backup) which "is a command line tool that backs up and restores your Gmail account".
You can invoke it like this (given that python is available on your machine):
python gyb.py --email foo#bar.com --search "from:pip#pop.com" --folder "mail_from_pip"
After completion you'll find all the emails matching the --search in the specified --folder, along with a sqlite database. (posted by dukedave, Dec 4 '11)
So depending on your OS you should be able to invoke the above command from within R and then access the downloaded mails in the respective folder.
GotYourBack is a good backup utility, but for downloading metadata for analysis, you might want something that doesn't first require you to fetch the entire content of all your email.
I've recently used the gmailr package to do a similar analysis.

Resources