I'm trying to migrate history data used to be stored in OpenTSDB to TDengine as I'm struggling with setting up multiple components like TSD, zookeeper etc. Does TDengine provide any ways to serve this purpose?
There is a tool named taosAdapter.
It is compatible with OpenTSDB JSON and Telnet format writing.
For historical data migration, you can try DataX.
Related
Scenario:
I've got a semi-structured dataset in JSON format. I'm storing the 3 subsets (new_records, upated_records, and deleted_records) from the dataset in 3 different Amazon DynamoDB tables. Scheduled to truncate and load daily.
I'm trying to create a mapping, to source data from these DynamoDB tables, append a few metadata columns (date_created, date_modified, is_active) and consolidate the data in a master DynamoDB table
Issues and Challenges:
I tried AWS Glue - Created Data Catalogue for source tables using Crawler. I understand AWS Glue doesn't provide provisions to store data in DynamoDB, so I changed the target to Amazon S3. However, the AWS Glue job results in creating some sort of reduced form of the data (parquet objects) in my Amazon S3 bucket. I've limited experience with PySpark, Pig, and Hive, so excuse me if I'm unable to explain clearly.
Quick research on Google hinted me to read parquet objects available on Amazon S3, using Amazon Athena or Redshift Spectrum.
I'm not sure, but this looks like overkill, doesn't it?
I read about Amazon Data Pipelines, which offers to quickly transfer data between different AWS services. Although I'm not sure if it provides some mechanism to create mappings between source and target (in order to append additional columns) or does it straightaway dumps data from one service to others?
Can anyone hint at a lucid and minimalistic solution?
-- Update --
I've been able to consolidate the data from Amazon DynamoDB to Amazon Redshift using AWS Glue, which turned out to be actually quite simple.
However, with Amazon Redshift, there are a few characteristic issues - its relational nature and its inability to directly perform a single merge, or upsert to update a table are few major things I'm considering here.
I'm considering if Amazon ElasticSearch can be used here, to index and consolidate the data from Amazon DynamoDB.
I'm not sure about your needs and assumptions. But let me post my thoughts that may help!
Why are you planning to do this migration? Think about this carefully.
Moving from 3 tables to 1 table, table size should not be an issue with DynamoDB But think about read/write unit capacity.
Athena is a good option, you will write SQL to query your data, will pay based on data scanned for your query, ... But Athena has 30 minutes query timeout. (I think you can request an increase for that, not sure!)
I think it is worth to try Data Pipelines. Yes, you can process the data while moving it.
What if we have copy files of persistent storage (blobs) for a given Kusto database and want to be able to access these outside Kusto? Is there any way or API available for reading these files? It appears that these are binary files in Kusto's proprietary format so can't just be read without some sort of API/bridge available from Kusto.
There is an API for accessing Kusto data through Kusto: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/api/.
You really don't want to access the blobs directly as they are stored in a heavily compressed and indexed column store format. You would have to replicate most of the Kusto database engine to do so. To do it right, you would effectively end up building another node on your Kusto cluster locally, and it's not clear that you would gain anything from that. For example, you'd be further from the data, so your queries would be slower. Better to just ask your Kusto cluster to do the work and send the results.
If you need to access the data using another platform you can .Export it.
If you really need to access the data directly, and are willing to sacrifice some performance, then your best bet is probably to store the data outside Kusto and map it as an External Table or use one of the SQL plugins to query the data in it's native format.
If you want to access Kusto data from a non-Kusto environment, you need to move the data out of Kusto into SQL or blob storage using the .export command.
https://learn.microsoft.com/en-us/azure/kusto/management/data-export/
The information isn't duplicated by ADX, it's indexed and compressed by ADX to enable ad-hoc interactive exploration experience.
In addition to the Kusto APIs you can query the data in Kusto using the Kusto(ADX) spark connector
I am planning to create sqlite table on my android app. The data comes from the the server via webservice.
I would like to know what is the best way to do this.
Should I transfer the data from the webservice in a sqlite db file and merge it or should i get all the data as a soap request and parse it in to table or should I use rest call.
The general size of the data is 2MB with 100 columns.
Please advise the best case where I can quickly get this data, with less load on the device.
My Workflow is:
Download a set of 20000 Addresses and save them to device sqlite database. This operation is only once, when you run the app for the first time or when you want to refresh the whole app data.
Update this record when ever there is a change in the server.
Now I can get this data either in JSON, XML or as pure SqLite File from the server . I want to know what is the fastest way to store this data in to Android Database.
I tried all the above methods and I found getting the database file from server and copying that data to the database is faster than getting the data in XML or JSON and parsing it. Please advise if I am right or wrong.
If you are planning to use sync adapters then you will need to implement a content provider (or atleast a stub) and an authenticator. Here is a good example that you can follow.
Also, you have not explained more about what is the use-case of such a web-service to decide what web-service architecture to suggest. But REST is a good style to write your services and using JSON over XML is advisable due to data format efficiency (or better yet give protocol-buffer a shot)
And yes, sync adapters are better to use as they already provide a great set of features that you will want to implement otherwise when written as a background service (e.g., periodic sync, auto sync, exponential backoff etc.)
To have less load on the device you can implement a sync-adapter backed by a content provider. You serialize/deserialize data when you upload/download data from server. When you need to persist data from the server you can use the bulkInsert() method in content-provider and persist all your data in a transaction
I found opentsdb as a powerful monitoring system. it has a structure like proc.loadavg.1min 1234567890 1.35 host=A.
But my questions are:
1- is it good for logging in php?
2- can I store every log data in it?
3- and please let me know if there is a good library for php for working with opnetsdb for e.g ( send data to opentsdb by php )
it not yet clear to me.
I would be tankful for any help.
In my opinion openTSDB is not a monitoring system but a way to store time series.
If you want to build a monitoring tool you'll need a bigger set of tools, including a way to feed your monitored metrics to the database and a way to display them.
For example you can use Logstash and statsd to collect, aggregate and send your metrics. For the display you can use a tool called Grafana.
openTSDB is just an options for storing it, but you can also Graphite or InfluxDB.
I am currently collecting monitoring metrics with Ganglia and I would like to show graphs with that data with Graphite. I know such an integration is possible, and I found an article describing how it should be done. I am not quite sure exactly how this integration works, especially when I want to send it straight into graphite without parsing the data of the gmetad. Any help on how to integrate Ganglia with Graphite will be great.
thanks
There are two approaches to integrate ganglia with graphite.
use third party process to get metrics from gmetad/gmond, tweak metrics data format, send metrics data to carbon server finally.
use the feature "graphite integration" of gmetad where you just need to configure the carbon server address, port, protocol (with an optional graphite path syntax), then gmetad will do all the things left. The more details can be found from your /etc/ganglia/gmetad.conf
I would recommend #2 since it's pretty simple. you just need to upgrade your ganglia packages to version 3.3+.
In above solutions, you can store metrics data in both RRD and whisper. If you don't want this approach, it also supports altering rrdtool graphs with graphite in ganglia-web. see "Using graphite as graphing engine"
Have you checked the ganglia-web wiki ? There is a section Graphite Integration and an other called Using Graphite as the graphing engine which explain well how to do what you want.
I've worked a lot with Ganglia, Graphite from what I've researched works similarly. I was never able to master Whisper, but I've found RRD's (round robin databases) to be pretty reliable. Not sure what you're interested in monitoring, but I would definitely check out JMXtrans. You can get the code from Google. It provides multiple methods for extracting metric data from whatever JVM you're monitoring, and lets you define which metrics you'd like to pipe to Ganglia/Graphite, and some other options.